You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by GitBox <gi...@apache.org> on 2022/08/18 07:03:47 UTC

[GitHub] [ratis] szetszwo opened a new pull request, #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

szetszwo opened a new pull request, #718:
URL: https://github.com/apache/ratis/pull/718

   See https://issues.apache.org/jira/browse/RATIS-1677


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] captainzmc commented on pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
captainzmc commented on PR #718:
URL: https://github.com/apache/ratis/pull/718#issuecomment-1237719826

   > The tests depend on the auto-format behavior need to be updated. As an example, see https://github.com/captainzmc/hadoop-ozone/runs/8182884310?check_suite_focus=true#step:6:2010 
   
   Thanks @szetszwo for the tip, I locally change the behavior of the failed CI during init and startup RaftServer. Now this UT was successful.
    I'll check all the failed UT in Ozone, and if it turns out to be OK, we can put RATIS-1677 and RATIS-1694 in the 2.4.0 release.  Just give me some time to make sure.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] lokeshj1703 commented on a diff in pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on code in PR #718:
URL: https://github.com/apache/ratis/pull/718#discussion_r953558238


##########
ratis-server/src/main/java/org/apache/ratis/server/storage/StorageImplUtils.java:
##########
@@ -17,38 +17,157 @@
  */
 package org.apache.ratis.server.storage;
 
+import org.apache.ratis.conf.RaftProperties;
+import org.apache.ratis.protocol.RaftPeerId;
 import org.apache.ratis.server.RaftServerConfigKeys;
-import org.apache.ratis.util.IOUtils;
-import org.apache.ratis.util.JavaUtils;
-import org.apache.ratis.util.TimeDuration;
+import org.apache.ratis.server.RaftServerConfigKeys.Log;
+import org.apache.ratis.server.storage.RaftStorage.StartupOption;
+import org.apache.ratis.util.Preconditions;
+import org.apache.ratis.util.SizeInBytes;
 
 import java.io.File;
 import java.io.IOException;
-import java.util.concurrent.TimeUnit;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+
+import static org.apache.ratis.server.RaftServer.Division.LOG;
 
 public final class StorageImplUtils {
+  private static final File[] EMPTY_FILE_ARRAY = {};
 
   private StorageImplUtils() {
     //Never constructed
   }
 
+  public static SnapshotManager newSnapshotManager(RaftPeerId id) {
+    return new SnapshotManager(id);
+  }
+
   /** Create a {@link RaftStorageImpl}. */
-  public static RaftStorageImpl newRaftStorage(File dir, RaftServerConfigKeys.Log.CorruptionPolicy logCorruptionPolicy,
-      RaftStorage.StartupOption option, long storageFeeSpaceMin) throws IOException {
-    RaftStorage.LOG.debug("newRaftStorage: {}, {}, {}, {}",dir, logCorruptionPolicy, option, storageFeeSpaceMin);
+  public static RaftStorageImpl newRaftStorage(File dir, SizeInBytes freeSpaceMin,
+      RaftStorage.StartupOption option, Log.CorruptionPolicy logCorruptionPolicy) {
+    return new RaftStorageImpl(dir, freeSpaceMin, option, logCorruptionPolicy);
+  }
+
+  private static List<File> getExistingStorageSubs(List<File> volumes, String targetSubDir,
+      Map<File, Integer> dirsPerVol) {
+    return volumes.stream().flatMap(volume -> {
+          final File[] dirs = Optional.ofNullable(volume.listFiles()).orElse(EMPTY_FILE_ARRAY);
+          Optional.ofNullable(dirsPerVol).ifPresent(map -> map.put(volume, dirs.length));
+          return Arrays.stream(dirs);
+        }).filter(dir -> targetSubDir.equals(dir.getName()))
+        .collect(Collectors.toList());
+  }
+
+  /**
+   * Get a list of existing subdirectories matching the given storage directory name from the given root directories.
+   */
+  static List<File> getExistingStorageSubs(String storageDirName, StartupOption option,
+      List<File> rootDirs, Map<File, Integer> dirsPerVol) throws IOException {
+    Preconditions.assertEmpty(dirsPerVol, "dirsPerVol");
+    final List<File> existingSubs = getExistingStorageSubs(rootDirs, storageDirName, dirsPerVol);
+    final int size = existingSubs.size();
+    if (option == StartupOption.RECOVER) {
+      if (size > 1) {
+        throw new IOException("Failed to " + option + ": More than one existing directories found " + existingSubs
+            + " for " + storageDirName);
+      } else if (size == 0) {
+        throw new IOException("Failed to " + option + ": Storage directory not found for " + storageDirName
+            + " from " + rootDirs);
+      }
+    } else if (option == StartupOption.FORMAT) {
+      if (size > 0) {
+        throw new IOException("Failed to " + option + ": One or more existing directories found " + existingSubs
+            + " for " + storageDirName);
+      }

Review Comment:
   Can we move these checks inside `recoverExistingStorageDir` and `formatNewStorageDir`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo merged pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
szetszwo merged PR #718:
URL: https://github.com/apache/ratis/pull/718


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo commented on pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on PR #718:
URL: https://github.com/apache/ratis/pull/718#issuecomment-1234367071

   @captainzmc , [RATIS-1694](https://issues.apache.org/jira/browse/RATIS-1694) is now merged.  Could you test if it could fix the problem so that we can include RATIS-1677 in 2.4.0?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo commented on pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on PR #718:
URL: https://github.com/apache/ratis/pull/718#issuecomment-1237825195

   @captainzmc , thanks a lot for your efforts on fixing the tests!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] captainzmc commented on pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
captainzmc commented on PR #718:
URL: https://github.com/apache/ratis/pull/718#issuecomment-1241509832

   Thanks @szetszwo for the explanation. Ozone will use Ratis 2.4.0 cut 1.3.0 release branch. Without sufficient testing, new problems may be introduced. I agree that exclude this commit in 2.4.0.  We can support this in Ozone's Master branch so that we will have more time for adaptation and testing. 
   @codings-dan Any other suggestions here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] codings-dan commented on pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
codings-dan commented on PR #718:
URL: https://github.com/apache/ratis/pull/718#issuecomment-1241542851

   Thanks @captainzmc  for helping test Ratis 2.4.0. Since introduction RATIS-1677 and RATIS-1694 will indeed make Ozone, Alluxio and other components change a lot, I also agree to introduce them in the next release. Since Alluxio really wants to introduce [RATIS-1695](https://github.com/apache/ratis/pull/733) in 2.4.0, can we start the next 2.4.0 release(2.4.0-rc2) after it's been merged? Also, do you think there is any other commit we have to introduce in 2.4.0? @szetszwo  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] captainzmc commented on pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
captainzmc commented on PR #718:
URL: https://github.com/apache/ratis/pull/718#issuecomment-1231085803

   Hi @szetszwo, thank you for this fix.
   I tested Ratis-2.4.0 rc1 in ozone and I found that we cherry-pick this patch into 2.4.0 release branch.
   There are two possible problems here:
   1. The interface compatibility issues, ozone is directly use [GroupManagementRequest.newAdd](https://github.com/captainzmc/hadoop-ozone/commit/bfa69f788d9f9c82218e5a8dc2801a33893df9f2#diff-7048844bed8c3f7f0689c28cdd4edd3ac3a70139e3c01b87fd79add572315ed3R754), this method changes in the PR maybe not compatible.
   2. Regardless of whether I set the format in newAdd to true or false, I will see many “Storage directory not found” exceptions in the UT of Ozone. (See the test UT, [set format to false](https://github.com/captainzmc/hadoop-ozone/runs/8072516884?check_suite_focus=true#step:5:9813) and [set format to true](https://github.com/captainzmc/hadoop-ozone/runs/8072573474?check_suite_focus=true#step:5:5170) )
   So do we need to revert this Commit revert in 2.4.0 branch? cc @codings-dan @adoroszlai 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo commented on a diff in pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on code in PR #718:
URL: https://github.com/apache/ratis/pull/718#discussion_r954590341


##########
ratis-server/src/main/java/org/apache/ratis/server/storage/StorageImplUtils.java:
##########
@@ -113,61 +93,86 @@ static List<File> getExistingStorageSubs(String storageDirName, StartupOption op
    */
   public static RaftStorageImpl initRaftStorage(String storageDirName, StartupOption option,
       RaftProperties properties) throws IOException {
-    final SizeInBytes freeSpaceMin = RaftServerConfigKeys.storageFreeSpaceMin(properties);
-    final Log.CorruptionPolicy logCorruptionPolicy = RaftServerConfigKeys.Log.corruptionPolicy(properties);
-    final List<File> dirsInConf = RaftServerConfigKeys.storageDir(properties);
-
-    final Map<File, Integer> dirsPerVol = new HashMap<>();
-    final List<File> existingDirs = getExistingStorageSubs(storageDirName, option, dirsInConf, dirsPerVol);
-    if (option == StartupOption.RECOVER) {
-      return recoverExistingStorageDir(existingDirs.get(0), freeSpaceMin, logCorruptionPolicy);
-    } else if (option == StartupOption.FORMAT) {
-      return formatNewStorageDir(storageDirName, dirsInConf, dirsPerVol, freeSpaceMin, logCorruptionPolicy);
-    } else {
-      throw new IllegalArgumentException("Illegal option: " + option);
-    }
+    return new Op(storageDirName, option, properties).run();
   }
 
-  private static RaftStorageImpl recoverExistingStorageDir(File dir, SizeInBytes freeSpaceMin,
-      Log.CorruptionPolicy logCorruptionPolicy) throws IOException {
-    try {
-      final RaftStorageImpl storage = newRaftStorage(dir, freeSpaceMin, StartupOption.RECOVER, logCorruptionPolicy);
-      storage.initialize();
-      return storage;
-    } catch (Throwable e) {
-      if (e instanceof IOException) {
-        throw e;
+  private static class Op {
+    private final String storageDirName;
+    private final StartupOption option;
+
+    private final SizeInBytes freeSpaceMin;
+    private final Log.CorruptionPolicy logCorruptionPolicy;
+    private final List<File> dirsInConf;
+
+    private final List<File> existingSubs;
+    private final Map<File, Integer> dirsPerVol = new HashMap<>();
+
+    Op(String storageDirName, StartupOption option, RaftProperties properties) {
+      this.storageDirName = storageDirName;
+      this.option = option;
+
+      this.freeSpaceMin = RaftServerConfigKeys.storageFreeSpaceMin(properties);
+      this.logCorruptionPolicy = RaftServerConfigKeys.Log.corruptionPolicy(properties);
+      this.dirsInConf = RaftServerConfigKeys.storageDir(properties);
+
+      this.existingSubs = getExistingStorageSubs(dirsInConf, this.storageDirName, dirsPerVol);
+    }
+
+    RaftStorageImpl run() throws IOException {
+      if (option == StartupOption.FORMAT) {
+        return format();
+      } else if (option == StartupOption.RECOVER) {
+        return recover();
+      } else {
+        throw new IllegalArgumentException("Illegal option: " + option);
       }
-      throw new IOException("Failed to initialize the existing directory " + dir.getAbsolutePath(), e);
     }
-  }
 
-  private static RaftStorageImpl formatNewStorageDir(String storageDirName, List<File> dirsInConf,
-      Map<File, Integer> dirsPerVol, SizeInBytes freeSpaceMin, Log.CorruptionPolicy logCorruptionPolicy)
-      throws IOException {
-    for(List<File> volumes = new ArrayList<>(dirsInConf); !volumes.isEmpty(); ) {
-      final File dir = chooseNewStorageDir(storageDirName, dirsPerVol);
+    private RaftStorageImpl format() throws IOException {
+      if (!existingSubs.isEmpty()) {
+        throw new IOException("Failed to " + option + ": One or more existing directories found " + existingSubs
+            + " for " + storageDirName);
+      }
+
+      for (List<File> volumes = new ArrayList<>(dirsInConf); !volumes.isEmpty(); ) {
+        final File dir = chooseNewStorageDir(storageDirName, dirsPerVol);

Review Comment:
   @lokeshj1703 , Good catch!  The catch block should remove the vol from dirsPerVol.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] captainzmc commented on pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
captainzmc commented on PR #718:
URL: https://github.com/apache/ratis/pull/718#issuecomment-1236661549

   > @captainzmc , [RATIS-1694](https://issues.apache.org/jira/browse/RATIS-1694) is now merged. Could you test if it could fix the problem so that we can include RATIS-1677 in 2.4.0?
   
   Hi @szetszwo, It appears that RATIS-1694 did not solve the [CI error problem.](https://github.com/captainzmc/hadoop-ozone/runs/8182883694?check_suite_focus=true#step:5:5415) 
   Here I use the test package published by Yao Long, which is his local commit log:
   <img width="653" alt="企业微信截图_7532426c-c640-418a-80e8-aaf4177c5e37" src="https://user-images.githubusercontent.com/13825159/188397923-cb4ef5c5-d3df-4f37-92b6-47760d3d38e2.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo commented on a diff in pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on code in PR #718:
URL: https://github.com/apache/ratis/pull/718#discussion_r954157445


##########
ratis-server/src/main/java/org/apache/ratis/server/storage/StorageImplUtils.java:
##########
@@ -17,38 +17,157 @@
  */
 package org.apache.ratis.server.storage;
 
+import org.apache.ratis.conf.RaftProperties;
+import org.apache.ratis.protocol.RaftPeerId;
 import org.apache.ratis.server.RaftServerConfigKeys;
-import org.apache.ratis.util.IOUtils;
-import org.apache.ratis.util.JavaUtils;
-import org.apache.ratis.util.TimeDuration;
+import org.apache.ratis.server.RaftServerConfigKeys.Log;
+import org.apache.ratis.server.storage.RaftStorage.StartupOption;
+import org.apache.ratis.util.Preconditions;
+import org.apache.ratis.util.SizeInBytes;
 
 import java.io.File;
 import java.io.IOException;
-import java.util.concurrent.TimeUnit;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+import java.util.stream.Collectors;
+
+import static org.apache.ratis.server.RaftServer.Division.LOG;
 
 public final class StorageImplUtils {
+  private static final File[] EMPTY_FILE_ARRAY = {};
 
   private StorageImplUtils() {
     //Never constructed
   }
 
+  public static SnapshotManager newSnapshotManager(RaftPeerId id) {
+    return new SnapshotManager(id);
+  }
+
   /** Create a {@link RaftStorageImpl}. */
-  public static RaftStorageImpl newRaftStorage(File dir, RaftServerConfigKeys.Log.CorruptionPolicy logCorruptionPolicy,
-      RaftStorage.StartupOption option, long storageFeeSpaceMin) throws IOException {
-    RaftStorage.LOG.debug("newRaftStorage: {}, {}, {}, {}",dir, logCorruptionPolicy, option, storageFeeSpaceMin);
+  public static RaftStorageImpl newRaftStorage(File dir, SizeInBytes freeSpaceMin,
+      RaftStorage.StartupOption option, Log.CorruptionPolicy logCorruptionPolicy) {
+    return new RaftStorageImpl(dir, freeSpaceMin, option, logCorruptionPolicy);
+  }
+
+  private static List<File> getExistingStorageSubs(List<File> volumes, String targetSubDir,
+      Map<File, Integer> dirsPerVol) {
+    return volumes.stream().flatMap(volume -> {
+          final File[] dirs = Optional.ofNullable(volume.listFiles()).orElse(EMPTY_FILE_ARRAY);
+          Optional.ofNullable(dirsPerVol).ifPresent(map -> map.put(volume, dirs.length));
+          return Arrays.stream(dirs);
+        }).filter(dir -> targetSubDir.equals(dir.getName()))
+        .collect(Collectors.toList());
+  }
+
+  /**
+   * Get a list of existing subdirectories matching the given storage directory name from the given root directories.
+   */
+  static List<File> getExistingStorageSubs(String storageDirName, StartupOption option,
+      List<File> rootDirs, Map<File, Integer> dirsPerVol) throws IOException {
+    Preconditions.assertEmpty(dirsPerVol, "dirsPerVol");
+    final List<File> existingSubs = getExistingStorageSubs(rootDirs, storageDirName, dirsPerVol);
+    final int size = existingSubs.size();
+    if (option == StartupOption.RECOVER) {
+      if (size > 1) {
+        throw new IOException("Failed to " + option + ": More than one existing directories found " + existingSubs
+            + " for " + storageDirName);
+      } else if (size == 0) {
+        throw new IOException("Failed to " + option + ": Storage directory not found for " + storageDirName
+            + " from " + rootDirs);
+      }
+    } else if (option == StartupOption.FORMAT) {
+      if (size > 0) {
+        throw new IOException("Failed to " + option + ": One or more existing directories found " + existingSubs
+            + " for " + storageDirName);
+      }

Review Comment:
   @lokeshj1703 , thanks for the suggestion!  If these checks are moved inside, then it needs to pass the variables for the error messages.  Let's create a class so that the variables will be available.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo commented on pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on PR #718:
URL: https://github.com/apache/ratis/pull/718#issuecomment-1227934717

   @lokeshj1703 , thanks a lot for reviewing this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo commented on pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on PR #718:
URL: https://github.com/apache/ratis/pull/718#issuecomment-1231221544

   Filed [RATIS-1694](https://issues.apache.org/jira/browse/RATIS-1694).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] captainzmc commented on pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
captainzmc commented on PR #718:
URL: https://github.com/apache/ratis/pull/718#issuecomment-1235041484

   @szetszwo Sure, let me verify that.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] lokeshj1703 commented on a diff in pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
lokeshj1703 commented on code in PR #718:
URL: https://github.com/apache/ratis/pull/718#discussion_r954557272


##########
ratis-server/src/main/java/org/apache/ratis/server/storage/StorageImplUtils.java:
##########
@@ -113,61 +93,86 @@ static List<File> getExistingStorageSubs(String storageDirName, StartupOption op
    */
   public static RaftStorageImpl initRaftStorage(String storageDirName, StartupOption option,
       RaftProperties properties) throws IOException {
-    final SizeInBytes freeSpaceMin = RaftServerConfigKeys.storageFreeSpaceMin(properties);
-    final Log.CorruptionPolicy logCorruptionPolicy = RaftServerConfigKeys.Log.corruptionPolicy(properties);
-    final List<File> dirsInConf = RaftServerConfigKeys.storageDir(properties);
-
-    final Map<File, Integer> dirsPerVol = new HashMap<>();
-    final List<File> existingDirs = getExistingStorageSubs(storageDirName, option, dirsInConf, dirsPerVol);
-    if (option == StartupOption.RECOVER) {
-      return recoverExistingStorageDir(existingDirs.get(0), freeSpaceMin, logCorruptionPolicy);
-    } else if (option == StartupOption.FORMAT) {
-      return formatNewStorageDir(storageDirName, dirsInConf, dirsPerVol, freeSpaceMin, logCorruptionPolicy);
-    } else {
-      throw new IllegalArgumentException("Illegal option: " + option);
-    }
+    return new Op(storageDirName, option, properties).run();
   }
 
-  private static RaftStorageImpl recoverExistingStorageDir(File dir, SizeInBytes freeSpaceMin,
-      Log.CorruptionPolicy logCorruptionPolicy) throws IOException {
-    try {
-      final RaftStorageImpl storage = newRaftStorage(dir, freeSpaceMin, StartupOption.RECOVER, logCorruptionPolicy);
-      storage.initialize();
-      return storage;
-    } catch (Throwable e) {
-      if (e instanceof IOException) {
-        throw e;
+  private static class Op {
+    private final String storageDirName;
+    private final StartupOption option;
+
+    private final SizeInBytes freeSpaceMin;
+    private final Log.CorruptionPolicy logCorruptionPolicy;
+    private final List<File> dirsInConf;
+
+    private final List<File> existingSubs;
+    private final Map<File, Integer> dirsPerVol = new HashMap<>();
+
+    Op(String storageDirName, StartupOption option, RaftProperties properties) {
+      this.storageDirName = storageDirName;
+      this.option = option;
+
+      this.freeSpaceMin = RaftServerConfigKeys.storageFreeSpaceMin(properties);
+      this.logCorruptionPolicy = RaftServerConfigKeys.Log.corruptionPolicy(properties);
+      this.dirsInConf = RaftServerConfigKeys.storageDir(properties);
+
+      this.existingSubs = getExistingStorageSubs(dirsInConf, this.storageDirName, dirsPerVol);
+    }
+
+    RaftStorageImpl run() throws IOException {
+      if (option == StartupOption.FORMAT) {
+        return format();
+      } else if (option == StartupOption.RECOVER) {
+        return recover();
+      } else {
+        throw new IllegalArgumentException("Illegal option: " + option);
       }
-      throw new IOException("Failed to initialize the existing directory " + dir.getAbsolutePath(), e);
     }
-  }
 
-  private static RaftStorageImpl formatNewStorageDir(String storageDirName, List<File> dirsInConf,
-      Map<File, Integer> dirsPerVol, SizeInBytes freeSpaceMin, Log.CorruptionPolicy logCorruptionPolicy)
-      throws IOException {
-    for(List<File> volumes = new ArrayList<>(dirsInConf); !volumes.isEmpty(); ) {
-      final File dir = chooseNewStorageDir(storageDirName, dirsPerVol);
+    private RaftStorageImpl format() throws IOException {
+      if (!existingSubs.isEmpty()) {
+        throw new IOException("Failed to " + option + ": One or more existing directories found " + existingSubs
+            + " for " + storageDirName);
+      }
+
+      for (List<File> volumes = new ArrayList<>(dirsInConf); !volumes.isEmpty(); ) {
+        final File dir = chooseNewStorageDir(storageDirName, dirsPerVol);

Review Comment:
   In this for loop, chooseNewStorageDir could return the same volume even if the volume has been removed in the catch block?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo commented on pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on PR #718:
URL: https://github.com/apache/ratis/pull/718#issuecomment-1231215723

   @captainzmc , thanks a lot for testing the rc1!
   
   > The interface compatibility issues, ozone is directly use [GroupManagementRequest.newAdd](https://github.com/captainzmc/hadoop-ozone/commit/bfa69f788d9f9c82218e5a8dc2801a33893df9f2#diff-7048844bed8c3f7f0689c28cdd4edd3ac3a70139e3c01b87fd79add572315ed3R754) ...
   
   Oops, we should add back the old method then.
   
   >  ... many “Storage directory not found”  ...
   
   The reason is that [the default startup option is RECOVER](https://github.com/apache/ratis/blob/62edc69ebc46d7ac2871fb815c2f86ab4c3e933f/ratis-server-api/src/main/java/org/apache/ratis/server/RaftServer.java#L196).  It probably is better to change the default to FORMAT.
   
   I am fine if we revert this from branch-2 and release 2.4.0.  A workaround for the bug is to specify exactly one storage dir in the conf.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo commented on pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on PR #718:
URL: https://github.com/apache/ratis/pull/718#issuecomment-1236675912

   @captainzmc , the number of failures was decreased dramatically.  It is expected to have test failures.  The tests depend on the auto-format behavior need to be updated.  As an example, see https://github.com/captainzmc/hadoop-ozone/runs/8182884310?check_suite_focus=true#step:6:2010 .  
   
   Note that this is a fix for a data loss case caused by the auto-format behavior.  So I suggest to include this in 2.4.0 and then update Ozone.  What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] captainzmc commented on pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
captainzmc commented on PR #718:
URL: https://github.com/apache/ratis/pull/718#issuecomment-1240581716

   Hi @szetszwo , recently I have been checking all failed UTs and trying to fix a few. I found that almost every UT related to OM, SCM, and Datanode restart needs to fix.  Init, startup, and restart of these components are all affected. Many of the default behaviors need to be changed.(Other project, such as Alluxio, may also face these issues)
   
   I wonder if we can avoid these effects by deciding in Ratis whether it need Format or RECOVER. For example, use RECOVER if the directory exists, and use format if the directory does not exist. Of course, this is just a simple idea, there must be some details to pay attention to. And what do you think here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [ratis] szetszwo commented on pull request #718: RATIS-1677. Do not auto format RaftStorage in RECOVER.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on PR #718:
URL: https://github.com/apache/ratis/pull/718#issuecomment-1240777453

   > ... For example, use RECOVER if the directory exists, and use format if the directory does not exist. ...
   
   @captainzmc , that is a nice idea.  Unfortunately, when no directories are found, Ratis cannot decide whether a disk with an storage directory is bad or there are indeed no existing storage directories.
   
   Note that, prior this change, the RECOVER option worked exactly like that -- if a directory is found, start with it; otherwise, choose a new directory to format.  Then, we discovered the problem described in [HDDS-7103](https://issues.apache.org/jira/browse/HDDS-7103).  When I tried to fix it, my first attempt was [RATIS-1668](https://issues.apache.org/jira/browse/RATIS-1668).  But then I realized that the fixed may lead to data loss described in [the JIRA of this pull request](https://issues.apache.org/jira/browse/RATIS-1677).
   
   In HDFS, Namenode requires FORMAT before starting up the first time.  It is the same thing.
   
   If it is too much work for the upstream applications (e.g. Ozone, Alluxio, etc), we may exclude this in 2.4.0.
   
   BTW, since the bug does not exist when there is only one directory, a workaround is NOT to configure multiple directories in `raft.server.storage.dir`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ratis.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org