You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ratis.apache.org by GitBox <gi...@apache.org> on 2021/05/07 13:12:17 UTC

[GitHub] [ratis] guihecheng opened a new pull request #477: RATIS-1375. Handle bad storage dir due to disk failures.

guihecheng opened a new pull request #477:
URL: https://github.com/apache/ratis/pull/477


   ## What changes were proposed in this pull request?
   
   Handle bad storage dir due to disk failures.
   This includes a small fix for the remove behavior of the directories list.
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/RATIS-1375
   
   ## How was this patch tested?
   
   manual tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] szetszwo commented on a change in pull request #477: RATIS-1375. Handle bad storage dir due to disk failures.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on a change in pull request #477:
URL: https://github.com/apache/ratis/pull/477#discussion_r629286642



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
##########
@@ -104,17 +104,24 @@
     configurationManager = new ConfigurationManager(initialConf);
     LOG.info("{}: {}", getMemberId(), configurationManager);
 
+    boolean storageFound = false;
     List<File> directories = RaftServerConfigKeys.storageDir(prop);
     while (!directories.isEmpty()) {
       // use full uuid string to create a subdirectory
       File dir = chooseStorageDir(directories, group.getGroupId().getUuid().toString());
       try {
         storage = new RaftStorageImpl(dir, RaftServerConfigKeys.Log.corruptionPolicy(prop));
+        storageFound = true;
         break;
-      } catch (AccessDeniedException e) {
-        directories.remove(dir);
+      } catch (IOException e) {
+        directories.removeIf(d -> d.getAbsolutePath().equals(dir.getParent()));

Review comment:
       I see.  That's a good point.  Thanks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] guihecheng commented on pull request #477: RATIS-1375. Handle bad storage dir due to disk failures.

Posted by GitBox <gi...@apache.org>.
guihecheng commented on pull request #477:
URL: https://github.com/apache/ratis/pull/477#issuecomment-838450149


   @szetszwo oh, sorry for the style problem, will fix it.
   And for the ut case, I should handle the exception path properly. 
   Originally, it should get an `OverlappingFileLockException` followed/wrapped by an `IOException`, and they will not match 'AccessDeniedException' and will be thrown inside the loop.
   For now, the second `IOException` is caught and handled, and we throw a new `IOException` outside the loop.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] guihecheng commented on pull request #477: RATIS-1375. Handle bad storage dir due to disk failures.

Posted by GitBox <gi...@apache.org>.
guihecheng commented on pull request #477:
URL: https://github.com/apache/ratis/pull/477#issuecomment-834959453


   Hello @szetszwo @runzhiwang , I'm new to ratis, could you help review this one when you are free? Thanks~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] guihecheng edited a comment on pull request #477: RATIS-1375. Handle bad storage dir due to disk failures.

Posted by GitBox <gi...@apache.org>.
guihecheng edited a comment on pull request #477:
URL: https://github.com/apache/ratis/pull/477#issuecomment-838450149


   @szetszwo oh, sorry for the style problem, will fix it.
   And for the ut case, I should handle the exception path properly. 
   Originally, it should get an `OverlappingFileLockException` followed/wrapped by an `IOException`, and they will not match 'AccessDeniedException' and will be thrown inside the loop.
   For now, the second `IOException` is caught and handled, and we throw a new `IOException` outside the loop.
   I tried to keep the test logic and throw the specific exception.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] guihecheng commented on a change in pull request #477: RATIS-1375. Handle bad storage dir due to disk failures.

Posted by GitBox <gi...@apache.org>.
guihecheng commented on a change in pull request #477:
URL: https://github.com/apache/ratis/pull/477#discussion_r629188441



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
##########
@@ -104,17 +104,24 @@
     configurationManager = new ConfigurationManager(initialConf);
     LOG.info("{}: {}", getMemberId(), configurationManager);
 
+    boolean storageFound = false;
     List<File> directories = RaftServerConfigKeys.storageDir(prop);
     while (!directories.isEmpty()) {
       // use full uuid string to create a subdirectory
       File dir = chooseStorageDir(directories, group.getGroupId().getUuid().toString());
       try {
         storage = new RaftStorageImpl(dir, RaftServerConfigKeys.Log.corruptionPolicy(prop));
+        storageFound = true;
         break;
-      } catch (AccessDeniedException e) {
-        directories.remove(dir);
+      } catch (IOException e) {

Review comment:
       @szetszwo ok.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] szetszwo commented on a change in pull request #477: RATIS-1375. Handle bad storage dir due to disk failures.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on a change in pull request #477:
URL: https://github.com/apache/ratis/pull/477#discussion_r629128917



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
##########
@@ -104,17 +104,24 @@
     configurationManager = new ConfigurationManager(initialConf);
     LOG.info("{}: {}", getMemberId(), configurationManager);
 
+    boolean storageFound = false;
     List<File> directories = RaftServerConfigKeys.storageDir(prop);
     while (!directories.isEmpty()) {
       // use full uuid string to create a subdirectory
       File dir = chooseStorageDir(directories, group.getGroupId().getUuid().toString());
       try {
         storage = new RaftStorageImpl(dir, RaftServerConfigKeys.Log.corruptionPolicy(prop));
+        storageFound = true;
         break;
-      } catch (AccessDeniedException e) {
-        directories.remove(dir);
+      } catch (IOException e) {
+        directories.removeIf(d -> d.getAbsolutePath().equals(dir.getParent()));
       }
     }
+
+    if (!storageFound) {
+      throw new IOException("No healthy directories found for RaftStorage");

Review comment:
       Let's add directory list `RaftServerConfigKeys.storageDir(prop)` to the error message.

##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
##########
@@ -104,17 +104,24 @@
     configurationManager = new ConfigurationManager(initialConf);
     LOG.info("{}: {}", getMemberId(), configurationManager);
 
+    boolean storageFound = false;
     List<File> directories = RaftServerConfigKeys.storageDir(prop);
     while (!directories.isEmpty()) {
       // use full uuid string to create a subdirectory
       File dir = chooseStorageDir(directories, group.getGroupId().getUuid().toString());
       try {
         storage = new RaftStorageImpl(dir, RaftServerConfigKeys.Log.corruptionPolicy(prop));
+        storageFound = true;
         break;
-      } catch (AccessDeniedException e) {
-        directories.remove(dir);
+      } catch (IOException e) {
+        directories.removeIf(d -> d.getAbsolutePath().equals(dir.getParent()));

Review comment:
       If we don't remove it, the while-loop won't terminate.

##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
##########
@@ -104,17 +104,24 @@
     configurationManager = new ConfigurationManager(initialConf);
     LOG.info("{}: {}", getMemberId(), configurationManager);
 
+    boolean storageFound = false;
     List<File> directories = RaftServerConfigKeys.storageDir(prop);
     while (!directories.isEmpty()) {
       // use full uuid string to create a subdirectory
       File dir = chooseStorageDir(directories, group.getGroupId().getUuid().toString());
       try {
         storage = new RaftStorageImpl(dir, RaftServerConfigKeys.Log.corruptionPolicy(prop));
+        storageFound = true;
         break;
-      } catch (AccessDeniedException e) {
-        directories.remove(dir);
+      } catch (IOException e) {

Review comment:
       Let's log the IOException.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] guihecheng commented on a change in pull request #477: RATIS-1375. Handle bad storage dir due to disk failures.

Posted by GitBox <gi...@apache.org>.
guihecheng commented on a change in pull request #477:
URL: https://github.com/apache/ratis/pull/477#discussion_r629188621



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
##########
@@ -104,17 +104,24 @@
     configurationManager = new ConfigurationManager(initialConf);
     LOG.info("{}: {}", getMemberId(), configurationManager);
 
+    boolean storageFound = false;
     List<File> directories = RaftServerConfigKeys.storageDir(prop);
     while (!directories.isEmpty()) {
       // use full uuid string to create a subdirectory
       File dir = chooseStorageDir(directories, group.getGroupId().getUuid().toString());
       try {
         storage = new RaftStorageImpl(dir, RaftServerConfigKeys.Log.corruptionPolicy(prop));
+        storageFound = true;
         break;
-      } catch (AccessDeniedException e) {
-        directories.remove(dir);
+      } catch (IOException e) {
+        directories.removeIf(d -> d.getAbsolutePath().equals(dir.getParent()));
       }
     }
+
+    if (!storageFound) {
+      throw new IOException("No healthy directories found for RaftStorage");

Review comment:
       @szetszwo ok.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] guihecheng commented on a change in pull request #477: RATIS-1375. Handle bad storage dir due to disk failures.

Posted by GitBox <gi...@apache.org>.
guihecheng commented on a change in pull request #477:
URL: https://github.com/apache/ratis/pull/477#discussion_r629275270



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
##########
@@ -104,17 +104,24 @@
     configurationManager = new ConfigurationManager(initialConf);
     LOG.info("{}: {}", getMemberId(), configurationManager);
 
+    boolean storageFound = false;
     List<File> directories = RaftServerConfigKeys.storageDir(prop);
     while (!directories.isEmpty()) {
       // use full uuid string to create a subdirectory
       File dir = chooseStorageDir(directories, group.getGroupId().getUuid().toString());
       try {
         storage = new RaftStorageImpl(dir, RaftServerConfigKeys.Log.corruptionPolicy(prop));
+        storageFound = true;
         break;
-      } catch (AccessDeniedException e) {
-        directories.remove(dir);
+      } catch (IOException e) {
+        directories.removeIf(d -> d.getAbsolutePath().equals(dir.getParent()));

Review comment:
       @szetszwo because the original `remove(dir)` is a BUG I think.
   Here dir points to '/data1/hdds-ratis/<groupID>', but I believe we want to iterator through all the storageDirs(/data1/hdds-ratis,/data2/hdds-ratis,...), and remove some 'dir' that is "BAD" which should points to '/dataX/hdds-ratis', not '/data1/hdds-ratis/<groupID>'.
   Then I use removeIf() to filter through the storageDirs and remove the exact storageDir that is BAD, so `dir.getParent()` here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] guihecheng commented on a change in pull request #477: RATIS-1375. Handle bad storage dir due to disk failures.

Posted by GitBox <gi...@apache.org>.
guihecheng commented on a change in pull request #477:
URL: https://github.com/apache/ratis/pull/477#discussion_r629275270



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
##########
@@ -104,17 +104,24 @@
     configurationManager = new ConfigurationManager(initialConf);
     LOG.info("{}: {}", getMemberId(), configurationManager);
 
+    boolean storageFound = false;
     List<File> directories = RaftServerConfigKeys.storageDir(prop);
     while (!directories.isEmpty()) {
       // use full uuid string to create a subdirectory
       File dir = chooseStorageDir(directories, group.getGroupId().getUuid().toString());
       try {
         storage = new RaftStorageImpl(dir, RaftServerConfigKeys.Log.corruptionPolicy(prop));
+        storageFound = true;
         break;
-      } catch (AccessDeniedException e) {
-        directories.remove(dir);
+      } catch (IOException e) {
+        directories.removeIf(d -> d.getAbsolutePath().equals(dir.getParent()));

Review comment:
       @szetszwo because the original `remove(dir)` is a BUG I think.
   Here dir points to '/data1/hdds-ratis/groupID', but I believe we want to iterator through all the storageDirs(/data1/hdds-ratis,/data2/hdds-ratis,...), and remove some 'dir' that is "BAD" which should points to '/dataX/hdds-ratis', not '/data1/hdds-ratis/groupID'.
   Then I use removeIf() to filter through the storageDirs and remove the exact storageDir that is BAD, so `dir.getParent()` here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] szetszwo merged pull request #477: RATIS-1375. Handle bad storage dir due to disk failures.

Posted by GitBox <gi...@apache.org>.
szetszwo merged pull request #477:
URL: https://github.com/apache/ratis/pull/477


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] guihecheng commented on pull request #477: RATIS-1375. Handle bad storage dir due to disk failures.

Posted by GitBox <gi...@apache.org>.
guihecheng commented on pull request #477:
URL: https://github.com/apache/ratis/pull/477#issuecomment-836622228


   @szetszwo updated thanks~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] szetszwo commented on a change in pull request #477: RATIS-1375. Handle bad storage dir due to disk failures.

Posted by GitBox <gi...@apache.org>.
szetszwo commented on a change in pull request #477:
URL: https://github.com/apache/ratis/pull/477#discussion_r629254451



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
##########
@@ -104,17 +104,24 @@
     configurationManager = new ConfigurationManager(initialConf);
     LOG.info("{}: {}", getMemberId(), configurationManager);
 
+    boolean storageFound = false;
     List<File> directories = RaftServerConfigKeys.storageDir(prop);
     while (!directories.isEmpty()) {
       // use full uuid string to create a subdirectory
       File dir = chooseStorageDir(directories, group.getGroupId().getUuid().toString());
       try {
         storage = new RaftStorageImpl(dir, RaftServerConfigKeys.Log.corruptionPolicy(prop));
+        storageFound = true;
         break;
-      } catch (AccessDeniedException e) {
-        directories.remove(dir);
+      } catch (IOException e) {
+        directories.removeIf(d -> d.getAbsolutePath().equals(dir.getParent()));

Review comment:
       I agree that it is good to change AccessDeniedException to IOException.  However, why replacing remove(..) with removeIf(..)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [ratis] guihecheng commented on a change in pull request #477: RATIS-1375. Handle bad storage dir due to disk failures.

Posted by GitBox <gi...@apache.org>.
guihecheng commented on a change in pull request #477:
URL: https://github.com/apache/ratis/pull/477#discussion_r629193702



##########
File path: ratis-server/src/main/java/org/apache/ratis/server/impl/ServerState.java
##########
@@ -104,17 +104,24 @@
     configurationManager = new ConfigurationManager(initialConf);
     LOG.info("{}: {}", getMemberId(), configurationManager);
 
+    boolean storageFound = false;
     List<File> directories = RaftServerConfigKeys.storageDir(prop);
     while (!directories.isEmpty()) {
       // use full uuid string to create a subdirectory
       File dir = chooseStorageDir(directories, group.getGroupId().getUuid().toString());
       try {
         storage = new RaftStorageImpl(dir, RaftServerConfigKeys.Log.corruptionPolicy(prop));
+        storageFound = true;
         break;
-      } catch (AccessDeniedException e) {
-        directories.remove(dir);
+      } catch (IOException e) {
+        directories.removeIf(d -> d.getAbsolutePath().equals(dir.getParent()));

Review comment:
       @szetszwo here is the BUG, in the original code, `dir` represents the newly created directory(name = UUID) for the group under the storageDir, not the storageDir itself, so it loops forever indeed under my test.
   So here I try to remove the exact storageDir, do you have a better way to write this line in Java?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org