You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Alban Chevignard (JIRA)" <ji...@apache.org> on 2009/04/21 20:06:47 UTC

[jira] Created: (HADOOP-5713) File write fails after data node goes down

File write fails after data node goes down
------------------------------------------

                 Key: HADOOP-5713
                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
            Reporter: Alban Chevignard


If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
{noformat} 
09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.0.66:50010
09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.0.66:50010
09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.0.66:50010
09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.0.66:50010
09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block.
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
{noformat} 

The tests were done with the following configuration:
* Hadoop version 0.18.3
* 3 data nodes with replication count of 2
* 1 GB file write
* 1 data node taken down during write

This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.

One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5713) File write fails after data node goes down

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708265#action_12708265 ] 

Hairong Kuang commented on HADOOP-5713:
---------------------------------------

Since I am currently working on hadoop-5744 and I have thought about how to recover a failed pipeline. One thought is that when createOutputStream fails, a dfs client should take the failed datanode out of the pipeline, bump the block's generation stamp, and then reconstruct a new pipeline with the remaining datanodes.

> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5713) File write fails after data node goes down

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708804#action_12708804 ] 

Raghu Angadi commented on HADOOP-5713:
--------------------------------------

oh.. right. errors during initial connect seem to be treated differently from errors during rest of the writing of the block.. In that sense temp exclude list should work for the connect problem on large clusters.

But somehow client telling NN what to allocate is less appealing to me.  Is it better to treat both errors similarly.. ie client could continue just with remaining good nodes? 

> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5713) File write fails after data node goes down

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708437#action_12708437 ] 

dhruba borthakur commented on HADOOP-5713:
------------------------------------------

> when createOutputStream fails, a dfs client should take the failed datanode out of the pipeline, bump the block's ge

@Hairong: This was purposely *not done* when we did the client-streaming-data-to-datanodes. The reason being that when you do this, you reduce the robustness of the block. You would remember that when a replica in the pipeline fails, the client continues writing to the other replicas and the NN makes no attempt to increase that's block's replication factor until the file is closed. This means that when we remove a datanode from a pipeline, we are exposing that block to a larger probability of going "missing or corrupt". This situation is unavoidable when the client has written partial data to a block and then encounters an error in the pipeline, in this case we ignore the bad datanode and continue with the remainder of the datanode(s). 

On the other hand, when the createOutputStream fails, we have the luxury of ignoring all the datanode inthe current pipeline because the client has not yet written any data to any of the datanodes in the pipeline. We could have ignored only the bad datanode (as you suggested), but this means that pipeline would be exposed to a higher probability of encountering a "missing/corrupt" block if the other two replicas also fail sometime in the near future before the file is closed. In this case, we can remove this degradation if we fetch an entirely new pipeline from the NN.

@Alban: Increasing the number of write retries in that case won't help. 

I understand your use-case now. The NN takes 10 minutes of no-heartbeats from a datanode to declare it dead. It is possible for you to set dfs.client.block.write.retries to a value that causes the client to retry for more than 10 minutes? In that case, your test case should succeed. The idea is that if the client does not bail out (but keeps retrying) for more than 10 minutes, it is bound to succeed. Please let us know.

I will also look at your patch in greater detail.




> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5713) File write fails after data node goes down

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708653#action_12708653 ] 

Hairong Kuang commented on HADOOP-5713:
---------------------------------------

I took a quick look at the patch and realized that this patch only handles the case when the initial pipeline creation fails. One comment about the patch is that the current addBlock command should be replaced by the new one.

> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5713) File write fails after data node goes down

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707773#action_12707773 ] 

dhruba borthakur commented on HADOOP-5713:
------------------------------------------

Your approach looks good but the client does not conclusively know whether the datanode is really dead, transient error or network failure. Also, we have not seen this problem occuring in a a resonable size cluster (i.e. #datanodes > 5 or so). Do you feel that the problem you mentioned occurs in a reasonable size cluster and warrants the additional complexity in the client?

Maybe you can set dfs.client.block.write.retries in your configuration to a large value to alleviate this problem?



> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5713) File write fails after data node goes down

Posted by "Alban Chevignard (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alban Chevignard updated HADOOP-5713:
-------------------------------------

    Description: 
If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
{noformat} 
09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.0.66:50010
09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.0.66:50010
09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.0.66:50010
09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
Bad connect ack with firstBadLink 192.168.0.66:50010
09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
Unable to create new block.
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
{noformat} 

The tests were done with the following configuration:
* Hadoop version 0.18.3
* 3 data nodes with replication count of 2
* 1 GB file write
* 1 data node taken down during write

This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.

One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

  was:
If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
{noformat} 
09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.0.66:50010
09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.0.66:50010
09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.0.66:50010
09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.0.66:50010
09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block.
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
{noformat} 

The tests were done with the following configuration:
* Hadoop version 0.18.3
* 3 data nodes with replication count of 2
* 1 GB file write
* 1 data node taken down during write

This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.

One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.


> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5713) File write fails after data node goes down

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708661#action_12708661 ] 

Raghu Angadi commented on HADOOP-5713:
--------------------------------------

> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. 

I don't think that is a good approach. On a bigger cluster Client may not know the correct exclude list. 

It is very important for HDFS writes to handle failures on partial set of datanodes (like this one). We should fix the real underlying problem.

> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-5713) File write fails after data node goes down

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708804#action_12708804 ] 

Raghu Angadi edited comment on HADOOP-5713 at 5/13/09 12:15 AM:
----------------------------------------------------------------

oh.. right. I read it incorrectly. In DFSClient errors during initial connect seem to be treated differently from errors during rest of the writing of the block.. In that sense temp exclude list should work for the connect problem on large clusters.

But somehow client telling NN what to allocate is less appealing to me.  Is it better to treat both errors similarly.. ie client could continue just with remaining good nodes? 

      was (Author: rangadi):
    oh.. right. errors during initial connect seem to be treated differently from errors during rest of the writing of the block.. In that sense temp exclude list should work for the connect problem on large clusters.

But somehow client telling NN what to allocate is less appealing to me.  Is it better to treat both errors similarly.. ie client could continue just with remaining good nodes? 
  
> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5713) File write fails after data node goes down

Posted by "Alban Chevignard (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707883#action_12707883 ] 

Alban Chevignard commented on HADOOP-5713:
------------------------------------------

You are right that the client does not know why the data node is unavailable, but it does not necessarily need to. In the proposed solution, the node is excluded for the lifetime of the output stream only, which does not affect other clients or any of the data structures on the name node.

This issue first came up while testing a cluster with 200 nodes, so it's actually more a matter of network topology rather than just cluster size. For example, if a rack in the cluster contains only two nodes and one of them goes down while a worker on the other node is trying to write to a file, the name node will keep assigning that dead node to the writer until it realizes that the node is down. Increasing the number of write retries in that case won't help. This happens even if there are hundreds of other live nodes in the cluster. Since we have seen this issue occur on a production cluster, we feel it's definitely worth the additional complexity on the client to address it.


> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5713) File write fails after data node goes down

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716160#action_12716160 ] 

dhruba borthakur commented on HADOOP-5713:
------------------------------------------

@Todd: if you have a reproducible test case, can you pl  set the value of dfs.client.block.write.retries to more than 10 minutes and then retry the test case to see if it encounters the same problem.

> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5713) File write fails after data node goes down

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716161#action_12716161 ] 

Todd Lipcon commented on HADOOP-5713:
-------------------------------------

Don't currently have a reproducible test case - the failure injection might be slightly tough. I'll see if I can cook something up, though - that is definitely the first step.

> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5713) File write fails after data node goes down

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716157#action_12716157 ] 

Todd Lipcon commented on HADOOP-5713:
-------------------------------------

Just wanted to drop a note to say that we've seen this problem in practice on a reasonable size (multirack) cluster as well. We're working on getting some better data to correlate the log messages with datanode failure events, but as far as we can tell the symptoms look identical.

Dhruba/Raghu: At this point I am relatively unfamiliar with this part of the code, but if you guys have a good idea of how to solve this, I'm happy to dive in and try to produce a patch in the next couple of weeks based on your guidance. Let me know.

> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5713) File write fails after data node goes down

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708767#action_12708767 ] 

dhruba borthakur commented on HADOOP-5713:
------------------------------------------

> On a bigger cluster Client may not know the correct exclude list

Can you pl explain why this is so?

> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-5713) File write fails after data node goes down

Posted by "Alban Chevignard (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alban Chevignard updated HADOOP-5713:
-------------------------------------

    Attachment: failed_write.patch

> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5713) File write fails after data node goes down

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708509#action_12708509 ] 

Hairong Kuang commented on HADOOP-5713:
---------------------------------------

> createOutputStream fails, we have the luxury of ignoring all the datanode inthe current pipeline because the client has not yet written any data to any of the datanodes in the pipeline.

createOutputStream is not used only when a block is initially created. It is also called for reconstructing a pipeline after block is recovered.

> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003 bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data node goes down and the time it is marked as dead by the name node. This delay is unavoidable, but the name node should not keep allocating new blocks to data nodes that are known to be down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.