You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2021/02/24 14:02:04 UTC

[GitHub] [hadoop] qizhu-lucas opened a new pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

qizhu-lucas opened a new pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721


   … stream closed configurable.
   
   @jojochuang Could you help review this?
   Thanks.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] qizhu-lucas commented on pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

Posted by GitBox <gi...@apache.org>.
qizhu-lucas commented on pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721#issuecomment-788521571


   @jojochuang 
   Could you help check the latest patch? 
   Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] qizhu-lucas commented on a change in pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

Posted by GitBox <gi...@apache.org>.
qizhu-lucas commented on a change in pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721#discussion_r582668892



##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
##########
@@ -4352,6 +4352,17 @@
   </description>
 </property>
 
+<property>
+  <name>dfs.client.packet.recovery.max.times</name>

Review comment:
       Thanks @Hexiaoqiao for review, i have fixed it in latest pull request.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] Hexiaoqiao merged pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

Posted by GitBox <gi...@apache.org>.
Hexiaoqiao merged pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] jojochuang commented on a change in pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

Posted by GitBox <gi...@apache.org>.
jojochuang commented on a change in pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721#discussion_r584392706



##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
##########
@@ -4352,6 +4352,17 @@
   </description>
 </property>
 
+<property>
+  <name>dfs.client.pipeline.recovery.max-retries</name>
+  <value>5</value>
+  <description>
+    If we had to recover the pipeline more than the value

Review comment:
       Suggest to add a description for this configuration property. Something like "if the DFS client encounters errors in write pipeline, retry up to the number defined by this property before giving up"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] qizhu-lucas commented on a change in pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

Posted by GitBox <gi...@apache.org>.
qizhu-lucas commented on a change in pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721#discussion_r584034504



##########
File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
##########
@@ -529,6 +529,7 @@ boolean doWaitForRestart() {
   private static final int CONGESTION_BACK_OFF_MAX_TIME_IN_MS =
       CONGESTION_BACKOFF_MEAN_TIME_IN_MS * 10;
   private int lastCongestionBackoffTime;
+  private int samePacketMaxRecoveryTimes;

Review comment:
       Thanks @Hexiaoqiao for patient review, i have fixed it in latest pull request.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] qizhu-lucas commented on pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

Posted by GitBox <gi...@apache.org>.
qizhu-lucas commented on pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721#issuecomment-788040129


   I think the test failed is not related to this.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] qizhu-lucas commented on a change in pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

Posted by GitBox <gi...@apache.org>.
qizhu-lucas commented on a change in pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721#discussion_r584422780



##########
File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
##########
@@ -1263,14 +1265,18 @@ private boolean processDatanodeOrExternalError() throws IOException {
       packetSendTime.clear();
     }
 
-    // If we had to recover the pipeline five times in a row for the
+    // If we had to recover the pipeline more than the value
+    // defined by maxPipelineRecoveryRetries in a row for the
     // same packet, this client likely has corrupt data or corrupting
     // during transmission.
-    if (!errorState.isRestartingNode() && ++pipelineRecoveryCount > 5) {
+    if (!errorState.isRestartingNode() && ++pipelineRecoveryCount >
+        maxPipelineRecoveryRetries) {
       LOG.warn("Error recovering pipeline for writing " +
-          block + ". Already retried 5 times for the same packet.");
+          block + ". Already retried " + maxPipelineRecoveryRetries
+          + " times for the same packet.");
       lastException.set(new IOException("Failing write. Tried pipeline " +
-          "recovery 5 times without success."));
+          "recovery "+ maxPipelineRecoveryRetries

Review comment:
       Thanks @jojochuang for review.
   Fixed it latest pull request.

##########
File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
##########
@@ -1263,14 +1265,18 @@ private boolean processDatanodeOrExternalError() throws IOException {
       packetSendTime.clear();
     }
 
-    // If we had to recover the pipeline five times in a row for the
+    // If we had to recover the pipeline more than the value
+    // defined by maxPipelineRecoveryRetries in a row for the
     // same packet, this client likely has corrupt data or corrupting
     // during transmission.
-    if (!errorState.isRestartingNode() && ++pipelineRecoveryCount > 5) {
+    if (!errorState.isRestartingNode() && ++pipelineRecoveryCount >
+        maxPipelineRecoveryRetries) {
       LOG.warn("Error recovering pipeline for writing " +
-          block + ". Already retried 5 times for the same packet.");
+          block + ". Already retried " + maxPipelineRecoveryRetries
+          + " times for the same packet.");
       lastException.set(new IOException("Failing write. Tried pipeline " +
-          "recovery 5 times without success."));
+          "recovery "+ maxPipelineRecoveryRetries

Review comment:
       Thanks @jojochuang for review.
   Fixed it latest pull request.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] qizhu-lucas commented on a change in pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

Posted by GitBox <gi...@apache.org>.
qizhu-lucas commented on a change in pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721#discussion_r584422934



##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
##########
@@ -4352,6 +4352,17 @@
   </description>
 </property>
 
+<property>
+  <name>dfs.client.pipeline.recovery.max-retries</name>
+  <value>5</value>
+  <description>
+    If we had to recover the pipeline more than the value

Review comment:
       Good suggestion, i have changed to it.
   Thanks @jojochuang 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] Hexiaoqiao commented on pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

Posted by GitBox <gi...@apache.org>.
Hexiaoqiao commented on pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721#issuecomment-788595297


   Committed to trunk. Thanks @qizhu-lucas for your contributions. Thanks @ayushtkn and @jojochuang for your reviews.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] jojochuang commented on a change in pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

Posted by GitBox <gi...@apache.org>.
jojochuang commented on a change in pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721#discussion_r584392237



##########
File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
##########
@@ -1263,14 +1265,18 @@ private boolean processDatanodeOrExternalError() throws IOException {
       packetSendTime.clear();
     }
 
-    // If we had to recover the pipeline five times in a row for the
+    // If we had to recover the pipeline more than the value
+    // defined by maxPipelineRecoveryRetries in a row for the
     // same packet, this client likely has corrupt data or corrupting
     // during transmission.
-    if (!errorState.isRestartingNode() && ++pipelineRecoveryCount > 5) {
+    if (!errorState.isRestartingNode() && ++pipelineRecoveryCount >
+        maxPipelineRecoveryRetries) {
       LOG.warn("Error recovering pipeline for writing " +
-          block + ". Already retried 5 times for the same packet.");
+          block + ". Already retried " + maxPipelineRecoveryRetries
+          + " times for the same packet.");
       lastException.set(new IOException("Failing write. Tried pipeline " +
-          "recovery 5 times without success."));
+          "recovery "+ maxPipelineRecoveryRetries

Review comment:
       nit: add space between " and +




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] ayushtkn commented on a change in pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

Posted by GitBox <gi...@apache.org>.
ayushtkn commented on a change in pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721#discussion_r584121631



##########
File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
##########
@@ -1263,14 +1265,18 @@ private boolean processDatanodeOrExternalError() throws IOException {
       packetSendTime.clear();
     }
 
-    // If we had to recover the pipeline five times in a row for the
+    // If we had to recover the pipeline exceed times which
+    // defined in maxPipelineRecoveryRetries in a row for the

Review comment:
       nit:
   Looks some grammatical error, can we change to,
   ``
   If we had to recover the pipeline more than the value
    defined by maxPipelineRecoveryRetries in a row for the
   ``
   

##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
##########
@@ -4352,6 +4352,17 @@
   </description>
 </property>
 
+<property>
+  <name>dfs.client.pipeline.recovery.max-retries</name>
+  <value>5</value>
+  <description>
+    If we had to recover the pipeline exceed times which
+    this value defined in a row for the same packet,
+    this client likely has corrupt data or corrupting
+    during transmission.

Review comment:
       Same as above.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] qizhu-lucas commented on a change in pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

Posted by GitBox <gi...@apache.org>.
qizhu-lucas commented on a change in pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721#discussion_r584422979



##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
##########
@@ -4352,6 +4352,17 @@
   </description>
 </property>
 
+<property>
+  <name>dfs.client.pipeline.recovery.max-retries</name>
+  <value>5</value>
+  <description>
+    If we had to recover the pipeline more than the value

Review comment:
       Good suggestion, i have changed to it.
   Thanks @jojochuang 

##########
File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
##########
@@ -1263,14 +1265,18 @@ private boolean processDatanodeOrExternalError() throws IOException {
       packetSendTime.clear();
     }
 
-    // If we had to recover the pipeline five times in a row for the
+    // If we had to recover the pipeline more than the value
+    // defined by maxPipelineRecoveryRetries in a row for the
     // same packet, this client likely has corrupt data or corrupting
     // during transmission.
-    if (!errorState.isRestartingNode() && ++pipelineRecoveryCount > 5) {
+    if (!errorState.isRestartingNode() && ++pipelineRecoveryCount >
+        maxPipelineRecoveryRetries) {
       LOG.warn("Error recovering pipeline for writing " +
-          block + ". Already retried 5 times for the same packet.");
+          block + ". Already retried " + maxPipelineRecoveryRetries
+          + " times for the same packet.");
       lastException.set(new IOException("Failing write. Tried pipeline " +
-          "recovery 5 times without success."));
+          "recovery "+ maxPipelineRecoveryRetries

Review comment:
       Thanks @jojochuang for review.
   Fixed it latest pull request.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] qizhu-lucas commented on a change in pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

Posted by GitBox <gi...@apache.org>.
qizhu-lucas commented on a change in pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721#discussion_r584125525



##########
File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
##########
@@ -1263,14 +1265,18 @@ private boolean processDatanodeOrExternalError() throws IOException {
       packetSendTime.clear();
     }
 
-    // If we had to recover the pipeline five times in a row for the
+    // If we had to recover the pipeline exceed times which
+    // defined in maxPipelineRecoveryRetries in a row for the

Review comment:
       Thanks @ayushtkn  for review, i have fixed the grammatical error.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] Hexiaoqiao commented on a change in pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

Posted by GitBox <gi...@apache.org>.
Hexiaoqiao commented on a change in pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721#discussion_r583497023



##########
File path: hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
##########
@@ -529,6 +529,7 @@ boolean doWaitForRestart() {
   private static final int CONGESTION_BACK_OFF_MAX_TIME_IN_MS =
       CONGESTION_BACKOFF_MEAN_TIME_IN_MS * 10;
   private int lastCongestionBackoffTime;
+  private int samePacketMaxRecoveryTimes;

Review comment:
       It is better to update this variable and related method name to `maxPipelineRecoveryRetries`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


[GitHub] [hadoop] Hexiaoqiao commented on a change in pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

Posted by GitBox <gi...@apache.org>.
Hexiaoqiao commented on a change in pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721#discussion_r582534061



##########
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
##########
@@ -4352,6 +4352,17 @@
   </description>
 </property>
 
+<property>
+  <name>dfs.client.packet.recovery.max.times</name>

Review comment:
       dfs.client.pipeline.recovery.max-retries?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org