You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/10/19 00:31:36 UTC

[GitHub] [ozone] bharatviswa504 opened a new pull request #2746: HDDS-5864. Retry when DN connection issue during getBlock/ReadChunk call during Ozone key Read

bharatviswa504 opened a new pull request #2746:
URL: https://github.com/apache/ozone/pull/2746


   ## What changes were proposed in this pull request?
   
   Retry when getChunks/readChunk fail with IOException due to DN connectivity issue/down.
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-5864
   
   ## How was this patch tested?
   
   Added UT. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 merged pull request #2746: HDDS-5864. Retry when DN connection issue during getBlock/ReadChunk call during Ozone key Read

Posted by GitBox <gi...@apache.org>.
bharatviswa504 merged pull request #2746:
URL: https://github.com/apache/ozone/pull/2746


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2746: HDDS-5864. Retry when DN connection issue during getBlock/ReadChunk call during Ozone key Read

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2746:
URL: https://github.com/apache/ozone/pull/2746#discussion_r733911587



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockInputStream.java
##########
@@ -320,6 +345,17 @@ synchronized int readWithStrategy(ByteReaderStrategy strategy) throws
         } else {
           throw e;
         }
+      } catch(SCMSecurityException ex) {
+        throw ex;
+      } catch(IOException ex) {
+        // We got a IOException which might be due
+        // to DN down or connectivity issue.
+        if (shouldRetryRead(ex)) {
+          current.releaseClient();

Review comment:
       Because we should not refresh pipeline, when IOException we want to retry on the same pipeline. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2746: HDDS-5864. Retry when DN connection issue during getBlock/ReadChunk call during Ozone key Read

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2746:
URL: https://github.com/apache/ozone/pull/2746#issuecomment-959632949






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2746: HDDS-5864. Retry when DN connection issue during getBlock/ReadChunk call during Ozone key Read

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2746:
URL: https://github.com/apache/ozone/pull/2746#issuecomment-959632949


   Thank You @adoroszlai for the review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2746: HDDS-5864. Retry when DN connection issue during getBlock/ReadChunk call during Ozone key Read

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2746:
URL: https://github.com/apache/ozone/pull/2746#discussion_r733921720



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockInputStream.java
##########
@@ -142,12 +143,36 @@ public synchronized void initialize() throws IOException {
       return;
     }
 
-    List<ChunkInfo> chunks;
+    List<ChunkInfo> chunks = null;
     try {
       chunks = getChunkInfos();
-    } catch (ContainerNotFoundException ioEx) {
-      refreshPipeline(ioEx);
-      chunks = getChunkInfos();
+    } catch(SCMSecurityException ex) {
+      throw ex;
+    } catch (IOException ioEx) {
+      // If refresh returns new pipeline, retry with it.
+      // If we get IOException due to connectivity issue,
+      // retry according to retry policy.
+      LOG.debug("Retry to get chunk info fail", ioEx);
+      if (ioEx instanceof StorageContainerException) {
+        refreshPipeline(ioEx);
+      }
+      IOException catchEx = ioEx;
+      while (shouldRetryRead(catchEx)) {
+        try {
+          chunks = getChunkInfos();

Review comment:
       Fixed it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2746: HDDS-5864. Retry when DN connection issue during getBlock/ReadChunk call during Ozone key Read

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2746:
URL: https://github.com/apache/ozone/pull/2746#issuecomment-959632949


   Thank You @adoroszlai for the review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on pull request #2746: HDDS-5864. Retry when DN connection issue during getBlock/ReadChunk call during Ozone key Read

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on pull request #2746:
URL: https://github.com/apache/ozone/pull/2746#issuecomment-951055499


   Hey @adoroszlai I have addressed/replied to review comments. Can you take a look at it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 merged pull request #2746: HDDS-5864. Retry when DN connection issue during getBlock/ReadChunk call during Ozone key Read

Posted by GitBox <gi...@apache.org>.
bharatviswa504 merged pull request #2746:
URL: https://github.com/apache/ozone/pull/2746






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 commented on a change in pull request #2746: HDDS-5864. Retry when DN connection issue during getBlock/ReadChunk call during Ozone key Read

Posted by GitBox <gi...@apache.org>.
bharatviswa504 commented on a change in pull request #2746:
URL: https://github.com/apache/ozone/pull/2746#discussion_r733921512



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockInputStream.java
##########
@@ -142,12 +143,36 @@ public synchronized void initialize() throws IOException {
       return;
     }
 
-    List<ChunkInfo> chunks;
+    List<ChunkInfo> chunks = null;
     try {
       chunks = getChunkInfos();
-    } catch (ContainerNotFoundException ioEx) {
-      refreshPipeline(ioEx);
-      chunks = getChunkInfos();
+    } catch(SCMSecurityException ex) {
+      throw ex;
+    } catch (IOException ioEx) {
+      // If refresh returns new pipeline, retry with it.
+      // If we get IOException due to connectivity issue,
+      // retry according to retry policy.
+      LOG.debug("Retry to get chunk info fail", ioEx);
+      if (ioEx instanceof StorageContainerException) {
+        refreshPipeline(ioEx);
+      }
+      IOException catchEx = ioEx;
+      while (shouldRetryRead(catchEx)) {

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] adoroszlai commented on a change in pull request #2746: HDDS-5864. Retry when DN connection issue during getBlock/ReadChunk call during Ozone key Read

Posted by GitBox <gi...@apache.org>.
adoroszlai commented on a change in pull request #2746:
URL: https://github.com/apache/ozone/pull/2746#discussion_r732568395



##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockInputStream.java
##########
@@ -142,12 +143,36 @@ public synchronized void initialize() throws IOException {
       return;
     }
 
-    List<ChunkInfo> chunks;
+    List<ChunkInfo> chunks = null;
     try {
       chunks = getChunkInfos();
-    } catch (ContainerNotFoundException ioEx) {
-      refreshPipeline(ioEx);
-      chunks = getChunkInfos();
+    } catch(SCMSecurityException ex) {
+      throw ex;
+    } catch (IOException ioEx) {
+      // If refresh returns new pipeline, retry with it.
+      // If we get IOException due to connectivity issue,
+      // retry according to retry policy.
+      LOG.debug("Retry to get chunk info fail", ioEx);
+      if (ioEx instanceof StorageContainerException) {
+        refreshPipeline(ioEx);
+      }
+      IOException catchEx = ioEx;
+      while (shouldRetryRead(catchEx)) {
+        try {
+          chunks = getChunkInfos();

Review comment:
       Need to exit the loop after successful `getChunkInfos()`, since `catchEx` is not updated in this case.
   
   ```suggestion
             chunks = getChunkInfos();
             break;
   ```

##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockInputStream.java
##########
@@ -320,6 +345,17 @@ synchronized int readWithStrategy(ByteReaderStrategy strategy) throws
         } else {
           throw e;
         }
+      } catch(SCMSecurityException ex) {
+        throw ex;
+      } catch(IOException ex) {
+        // We got a IOException which might be due
+        // to DN down or connectivity issue.
+        if (shouldRetryRead(ex)) {
+          current.releaseClient();

Review comment:
       Why not `handleReadError(ex)`?

##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/BlockInputStream.java
##########
@@ -142,12 +143,36 @@ public synchronized void initialize() throws IOException {
       return;
     }
 
-    List<ChunkInfo> chunks;
+    List<ChunkInfo> chunks = null;
     try {
       chunks = getChunkInfos();
-    } catch (ContainerNotFoundException ioEx) {
-      refreshPipeline(ioEx);
-      chunks = getChunkInfos();
+    } catch(SCMSecurityException ex) {
+      throw ex;
+    } catch (IOException ioEx) {
+      // If refresh returns new pipeline, retry with it.
+      // If we get IOException due to connectivity issue,
+      // retry according to retry policy.
+      LOG.debug("Retry to get chunk info fail", ioEx);
+      if (ioEx instanceof StorageContainerException) {
+        refreshPipeline(ioEx);
+      }
+      IOException catchEx = ioEx;
+      while (shouldRetryRead(catchEx)) {

Review comment:
       I feel the retry loop should be merged with the first attempt to reduce complexity/duplication.

##########
File path: hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/ChunkInputStream.java
##########
@@ -287,7 +287,7 @@ public synchronized void close() {
 
   protected synchronized void releaseClient() {
     if (xceiverClientFactory != null && xceiverClient != null) {
-      xceiverClientFactory.releaseClient(xceiverClient, false);
+      xceiverClientFactory.releaseClientForReadData(xceiverClient, false);

Review comment:
       Nice catch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org


[GitHub] [ozone] bharatviswa504 merged pull request #2746: HDDS-5864. Retry when DN connection issue during getBlock/ReadChunk call during Ozone key Read

Posted by GitBox <gi...@apache.org>.
bharatviswa504 merged pull request #2746:
URL: https://github.com/apache/ozone/pull/2746


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org