You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by sunww <sp...@outlook.com> on 2014/10/14 11:31:58 UTC

write to most datanode fail quickly

Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
Hi    the correct  ip is  132.228.248.20.    I check  hdfs log in  the dead regionserver, it have some error message, maybe it's useful.    http://paste2.org/NwpcaGVv
Thanks

Date: Tue, 14 Oct 2014 10:28:31 -0700
Subject: Re: write to most datanode fail quickly
From: yuzhihong@gmail.com
To: user@hadoop.apache.org

132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you posted.

I don't see error or exception either.
Perhaps search in wider scope.
On Tue, Oct 14, 2014 at 5:36 AM, sunww <sp...@outlook.com> wrote:



Hi
dfs.client.read.shortcircuit is true.
this is namenode log at that moment:http://paste2.org/U0zDA9ms
It seems like there is no special in namenode log. 

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 03:09:24 -0700
To: user@hadoop.apache.org

Can you check NameNode log for 132.228.48.20 ?
Have you turned on short circuit read ?
Cheers
On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:





I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  
 		 	   		  
 		 	   		  

 		 	   		  

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
Hi    the correct  ip is  132.228.248.20.    I check  hdfs log in  the dead regionserver, it have some error message, maybe it's useful.    http://paste2.org/NwpcaGVv
Thanks

Date: Tue, 14 Oct 2014 10:28:31 -0700
Subject: Re: write to most datanode fail quickly
From: yuzhihong@gmail.com
To: user@hadoop.apache.org

132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you posted.

I don't see error or exception either.
Perhaps search in wider scope.
On Tue, Oct 14, 2014 at 5:36 AM, sunww <sp...@outlook.com> wrote:



Hi
dfs.client.read.shortcircuit is true.
this is namenode log at that moment:http://paste2.org/U0zDA9ms
It seems like there is no special in namenode log. 

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 03:09:24 -0700
To: user@hadoop.apache.org

Can you check NameNode log for 132.228.48.20 ?
Have you turned on short circuit read ?
Cheers
On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:





I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  
 		 	   		  
 		 	   		  

 		 	   		  

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
Hi    the correct  ip is  132.228.248.20.    I check  hdfs log in  the dead regionserver, it have some error message, maybe it's useful.    http://paste2.org/NwpcaGVv
Thanks

Date: Tue, 14 Oct 2014 10:28:31 -0700
Subject: Re: write to most datanode fail quickly
From: yuzhihong@gmail.com
To: user@hadoop.apache.org

132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you posted.

I don't see error or exception either.
Perhaps search in wider scope.
On Tue, Oct 14, 2014 at 5:36 AM, sunww <sp...@outlook.com> wrote:



Hi
dfs.client.read.shortcircuit is true.
this is namenode log at that moment:http://paste2.org/U0zDA9ms
It seems like there is no special in namenode log. 

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 03:09:24 -0700
To: user@hadoop.apache.org

Can you check NameNode log for 132.228.48.20 ?
Have you turned on short circuit read ?
Cheers
On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:





I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  
 		 	   		  
 		 	   		  

 		 	   		  

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
Hi    the correct  ip is  132.228.248.20.    I check  hdfs log in  the dead regionserver, it have some error message, maybe it's useful.    http://paste2.org/NwpcaGVv
Thanks

Date: Tue, 14 Oct 2014 10:28:31 -0700
Subject: Re: write to most datanode fail quickly
From: yuzhihong@gmail.com
To: user@hadoop.apache.org

132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you posted.

I don't see error or exception either.
Perhaps search in wider scope.
On Tue, Oct 14, 2014 at 5:36 AM, sunww <sp...@outlook.com> wrote:



Hi
dfs.client.read.shortcircuit is true.
this is namenode log at that moment:http://paste2.org/U0zDA9ms
It seems like there is no special in namenode log. 

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 03:09:24 -0700
To: user@hadoop.apache.org

Can you check NameNode log for 132.228.48.20 ?
Have you turned on short circuit read ?
Cheers
On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:





I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  
 		 	   		  
 		 	   		  

 		 	   		  

Re: write to most datanode fail quickly

Posted by Ted Yu <yu...@gmail.com>.
132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you
posted.

I don't see error or exception either.

Perhaps search in wider scope.

On Tue, Oct 14, 2014 at 5:36 AM, sunww <sp...@outlook.com> wrote:

> Hi
>
> dfs.client.read.shortcircuit is true.
>
> this is namenode log at that moment:
> http://paste2.org/U0zDA9ms
>
> It seems like there is no special in namenode log.
>
> Thanks
> ------------------------------
> CC: user@hadoop.apache.org
> From: yuzhihong@gmail.com
> Subject: Re: write to most datanode fail quickly
> Date: Tue, 14 Oct 2014 03:09:24 -0700
>
> To: user@hadoop.apache.org
>
> Can you check NameNode log for 132.228.48.20 ?
>
> Have you turned on short circuit read ?
>
> Cheers
>
> On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:
>
>
> I'm using Hadoop 2.0.0 and  not  run fsck.
> only one regionserver have these dfs logs,   strange.
>
> Thanks
> ------------------------------
> CC: user@hadoop.apache.org
> From: yuzhihong@gmail.com
> Subject: Re: write to most datanode fail quickly
> Date: Tue, 14 Oct 2014 02:43:26 -0700
> To: user@hadoop.apache.org
>
> Which Hadoop release are you using ?
>
> Have you run fsck ?
>
> Cheers
>
> On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:
>
> Hi
>     I'm using hbase with about 20 regionserver. And  one regionserver
> failed to write  most of datanodes quickly, finally cause this
> regionserver die. While other regionserver is ok.
>
> logs like this:
>
> java.io.IOException: Bad response ERROR for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217
> from datanode 132.228.248.20:50010
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
> 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217
> in pipeline 132.228.248.17:50010, 132.228.248.20:50010,
> 132.228.248.41:50010: bad datanode 132.228.248.20:50010
> 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> java.io.IOException: Bad response ERROR for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> from datanode 132.228.248.41:50010
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
>
>
>
>     then serveral  "firstBadLink error "
>     2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient:
> Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as
> 132.228.248.18:50010
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
>
>
>     then serveral "Failed to add a datanode"
>     2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error
> while syncing
> java.io.IOException: Failed to add a datanode.  User may turn off this
> feature by setting
> dfs.client.block.write.replace-datanode-on-failure.policy in configuration,
> where the current policy is DEFAULT.  (Nodes: current=[
> 132.228.248.17:50010, 132.228.248.35:50010], original=[
> 132.228.248.17:50010, 132.228.248.35:50010])
>
>     the full log is in http://paste2.org/xfn16jm2
>
>     Any suggestion will be appreciated. Thanks.
>
>

Re: write to most datanode fail quickly

Posted by Ted Yu <yu...@gmail.com>.
132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you
posted.

I don't see error or exception either.

Perhaps search in wider scope.

On Tue, Oct 14, 2014 at 5:36 AM, sunww <sp...@outlook.com> wrote:

> Hi
>
> dfs.client.read.shortcircuit is true.
>
> this is namenode log at that moment:
> http://paste2.org/U0zDA9ms
>
> It seems like there is no special in namenode log.
>
> Thanks
> ------------------------------
> CC: user@hadoop.apache.org
> From: yuzhihong@gmail.com
> Subject: Re: write to most datanode fail quickly
> Date: Tue, 14 Oct 2014 03:09:24 -0700
>
> To: user@hadoop.apache.org
>
> Can you check NameNode log for 132.228.48.20 ?
>
> Have you turned on short circuit read ?
>
> Cheers
>
> On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:
>
>
> I'm using Hadoop 2.0.0 and  not  run fsck.
> only one regionserver have these dfs logs,   strange.
>
> Thanks
> ------------------------------
> CC: user@hadoop.apache.org
> From: yuzhihong@gmail.com
> Subject: Re: write to most datanode fail quickly
> Date: Tue, 14 Oct 2014 02:43:26 -0700
> To: user@hadoop.apache.org
>
> Which Hadoop release are you using ?
>
> Have you run fsck ?
>
> Cheers
>
> On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:
>
> Hi
>     I'm using hbase with about 20 regionserver. And  one regionserver
> failed to write  most of datanodes quickly, finally cause this
> regionserver die. While other regionserver is ok.
>
> logs like this:
>
> java.io.IOException: Bad response ERROR for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217
> from datanode 132.228.248.20:50010
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
> 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217
> in pipeline 132.228.248.17:50010, 132.228.248.20:50010,
> 132.228.248.41:50010: bad datanode 132.228.248.20:50010
> 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> java.io.IOException: Bad response ERROR for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> from datanode 132.228.248.41:50010
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
>
>
>
>     then serveral  "firstBadLink error "
>     2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient:
> Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as
> 132.228.248.18:50010
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
>
>
>     then serveral "Failed to add a datanode"
>     2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error
> while syncing
> java.io.IOException: Failed to add a datanode.  User may turn off this
> feature by setting
> dfs.client.block.write.replace-datanode-on-failure.policy in configuration,
> where the current policy is DEFAULT.  (Nodes: current=[
> 132.228.248.17:50010, 132.228.248.35:50010], original=[
> 132.228.248.17:50010, 132.228.248.35:50010])
>
>     the full log is in http://paste2.org/xfn16jm2
>
>     Any suggestion will be appreciated. Thanks.
>
>

Re: write to most datanode fail quickly

Posted by Ted Yu <yu...@gmail.com>.
132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you
posted.

I don't see error or exception either.

Perhaps search in wider scope.

On Tue, Oct 14, 2014 at 5:36 AM, sunww <sp...@outlook.com> wrote:

> Hi
>
> dfs.client.read.shortcircuit is true.
>
> this is namenode log at that moment:
> http://paste2.org/U0zDA9ms
>
> It seems like there is no special in namenode log.
>
> Thanks
> ------------------------------
> CC: user@hadoop.apache.org
> From: yuzhihong@gmail.com
> Subject: Re: write to most datanode fail quickly
> Date: Tue, 14 Oct 2014 03:09:24 -0700
>
> To: user@hadoop.apache.org
>
> Can you check NameNode log for 132.228.48.20 ?
>
> Have you turned on short circuit read ?
>
> Cheers
>
> On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:
>
>
> I'm using Hadoop 2.0.0 and  not  run fsck.
> only one regionserver have these dfs logs,   strange.
>
> Thanks
> ------------------------------
> CC: user@hadoop.apache.org
> From: yuzhihong@gmail.com
> Subject: Re: write to most datanode fail quickly
> Date: Tue, 14 Oct 2014 02:43:26 -0700
> To: user@hadoop.apache.org
>
> Which Hadoop release are you using ?
>
> Have you run fsck ?
>
> Cheers
>
> On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:
>
> Hi
>     I'm using hbase with about 20 regionserver. And  one regionserver
> failed to write  most of datanodes quickly, finally cause this
> regionserver die. While other regionserver is ok.
>
> logs like this:
>
> java.io.IOException: Bad response ERROR for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217
> from datanode 132.228.248.20:50010
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
> 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217
> in pipeline 132.228.248.17:50010, 132.228.248.20:50010,
> 132.228.248.41:50010: bad datanode 132.228.248.20:50010
> 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> java.io.IOException: Bad response ERROR for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> from datanode 132.228.248.41:50010
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
>
>
>
>     then serveral  "firstBadLink error "
>     2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient:
> Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as
> 132.228.248.18:50010
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
>
>
>     then serveral "Failed to add a datanode"
>     2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error
> while syncing
> java.io.IOException: Failed to add a datanode.  User may turn off this
> feature by setting
> dfs.client.block.write.replace-datanode-on-failure.policy in configuration,
> where the current policy is DEFAULT.  (Nodes: current=[
> 132.228.248.17:50010, 132.228.248.35:50010], original=[
> 132.228.248.17:50010, 132.228.248.35:50010])
>
>     the full log is in http://paste2.org/xfn16jm2
>
>     Any suggestion will be appreciated. Thanks.
>
>

Re: write to most datanode fail quickly

Posted by Ted Yu <yu...@gmail.com>.
132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you
posted.

I don't see error or exception either.

Perhaps search in wider scope.

On Tue, Oct 14, 2014 at 5:36 AM, sunww <sp...@outlook.com> wrote:

> Hi
>
> dfs.client.read.shortcircuit is true.
>
> this is namenode log at that moment:
> http://paste2.org/U0zDA9ms
>
> It seems like there is no special in namenode log.
>
> Thanks
> ------------------------------
> CC: user@hadoop.apache.org
> From: yuzhihong@gmail.com
> Subject: Re: write to most datanode fail quickly
> Date: Tue, 14 Oct 2014 03:09:24 -0700
>
> To: user@hadoop.apache.org
>
> Can you check NameNode log for 132.228.48.20 ?
>
> Have you turned on short circuit read ?
>
> Cheers
>
> On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:
>
>
> I'm using Hadoop 2.0.0 and  not  run fsck.
> only one regionserver have these dfs logs,   strange.
>
> Thanks
> ------------------------------
> CC: user@hadoop.apache.org
> From: yuzhihong@gmail.com
> Subject: Re: write to most datanode fail quickly
> Date: Tue, 14 Oct 2014 02:43:26 -0700
> To: user@hadoop.apache.org
>
> Which Hadoop release are you using ?
>
> Have you run fsck ?
>
> Cheers
>
> On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:
>
> Hi
>     I'm using hbase with about 20 regionserver. And  one regionserver
> failed to write  most of datanodes quickly, finally cause this
> regionserver die. While other regionserver is ok.
>
> logs like this:
>
> java.io.IOException: Bad response ERROR for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217
> from datanode 132.228.248.20:50010
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
> 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217
> in pipeline 132.228.248.17:50010, 132.228.248.20:50010,
> 132.228.248.41:50010: bad datanode 132.228.248.20:50010
> 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> java.io.IOException: Bad response ERROR for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> from datanode 132.228.248.41:50010
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
>
>
>
>     then serveral  "firstBadLink error "
>     2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient:
> Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as
> 132.228.248.18:50010
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
>
>
>     then serveral "Failed to add a datanode"
>     2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error
> while syncing
> java.io.IOException: Failed to add a datanode.  User may turn off this
> feature by setting
> dfs.client.block.write.replace-datanode-on-failure.policy in configuration,
> where the current policy is DEFAULT.  (Nodes: current=[
> 132.228.248.17:50010, 132.228.248.35:50010], original=[
> 132.228.248.17:50010, 132.228.248.35:50010])
>
>     the full log is in http://paste2.org/xfn16jm2
>
>     Any suggestion will be appreciated. Thanks.
>
>

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
Hi
dfs.client.read.shortcircuit is true.
this is namenode log at that moment:http://paste2.org/U0zDA9ms
It seems like there is no special in namenode log. 

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 03:09:24 -0700
To: user@hadoop.apache.org

Can you check NameNode log for 132.228.48.20 ?
Have you turned on short circuit read ?
Cheers
On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:





I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  
 		 	   		  
 		 	   		  

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
Hi
dfs.client.read.shortcircuit is true.
this is namenode log at that moment:http://paste2.org/U0zDA9ms
It seems like there is no special in namenode log. 

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 03:09:24 -0700
To: user@hadoop.apache.org

Can you check NameNode log for 132.228.48.20 ?
Have you turned on short circuit read ?
Cheers
On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:





I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  
 		 	   		  
 		 	   		  

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
Hi
dfs.client.read.shortcircuit is true.
this is namenode log at that moment:http://paste2.org/U0zDA9ms
It seems like there is no special in namenode log. 

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 03:09:24 -0700
To: user@hadoop.apache.org

Can you check NameNode log for 132.228.48.20 ?
Have you turned on short circuit read ?
Cheers
On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:





I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  
 		 	   		  
 		 	   		  

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
Hi
dfs.client.read.shortcircuit is true.
this is namenode log at that moment:http://paste2.org/U0zDA9ms
It seems like there is no special in namenode log. 

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 03:09:24 -0700
To: user@hadoop.apache.org

Can you check NameNode log for 132.228.48.20 ?
Have you turned on short circuit read ?
Cheers
On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:





I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  
 		 	   		  
 		 	   		  

Re: write to most datanode fail quickly

Posted by Ted Yu <yu...@gmail.com>.
Can you check NameNode log for 132.228.48.20 ?

Have you turned on short circuit read ?

Cheers

On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:

> 
> I'm using Hadoop 2.0.0 and  not  run fsck.  
> only one regionserver have these dfs logs,   strange.
> 
> Thanks
> CC: user@hadoop.apache.org
> From: yuzhihong@gmail.com
> Subject: Re: write to most datanode fail quickly
> Date: Tue, 14 Oct 2014 02:43:26 -0700
> To: user@hadoop.apache.org
> 
> Which Hadoop release are you using ?
> 
> Have you run fsck ?
> 
> Cheers
> 
> On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:
> 
> Hi
>     I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
> 
> logs like this:
>     
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
> 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:50010
> 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)    
>     
>     
>     then serveral  "firstBadLink error "
>     2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
>     
>     
>     then serveral "Failed to add a datanode"
>     2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing
> java.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
> 
>     the full log is in http://paste2.org/xfn16jm2
>     
>     Any suggestion will be appreciated. Thanks.

Re: write to most datanode fail quickly

Posted by Ted Yu <yu...@gmail.com>.
Can you check NameNode log for 132.228.48.20 ?

Have you turned on short circuit read ?

Cheers

On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:

> 
> I'm using Hadoop 2.0.0 and  not  run fsck.  
> only one regionserver have these dfs logs,   strange.
> 
> Thanks
> CC: user@hadoop.apache.org
> From: yuzhihong@gmail.com
> Subject: Re: write to most datanode fail quickly
> Date: Tue, 14 Oct 2014 02:43:26 -0700
> To: user@hadoop.apache.org
> 
> Which Hadoop release are you using ?
> 
> Have you run fsck ?
> 
> Cheers
> 
> On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:
> 
> Hi
>     I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
> 
> logs like this:
>     
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
> 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:50010
> 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)    
>     
>     
>     then serveral  "firstBadLink error "
>     2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
>     
>     
>     then serveral "Failed to add a datanode"
>     2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing
> java.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
> 
>     the full log is in http://paste2.org/xfn16jm2
>     
>     Any suggestion will be appreciated. Thanks.

Re: write to most datanode fail quickly

Posted by Ted Yu <yu...@gmail.com>.
Can you check NameNode log for 132.228.48.20 ?

Have you turned on short circuit read ?

Cheers

On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:

> 
> I'm using Hadoop 2.0.0 and  not  run fsck.  
> only one regionserver have these dfs logs,   strange.
> 
> Thanks
> CC: user@hadoop.apache.org
> From: yuzhihong@gmail.com
> Subject: Re: write to most datanode fail quickly
> Date: Tue, 14 Oct 2014 02:43:26 -0700
> To: user@hadoop.apache.org
> 
> Which Hadoop release are you using ?
> 
> Have you run fsck ?
> 
> Cheers
> 
> On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:
> 
> Hi
>     I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
> 
> logs like this:
>     
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
> 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:50010
> 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)    
>     
>     
>     then serveral  "firstBadLink error "
>     2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
>     
>     
>     then serveral "Failed to add a datanode"
>     2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing
> java.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
> 
>     the full log is in http://paste2.org/xfn16jm2
>     
>     Any suggestion will be appreciated. Thanks.

Re: write to most datanode fail quickly

Posted by Ted Yu <yu...@gmail.com>.
Can you check NameNode log for 132.228.48.20 ?

Have you turned on short circuit read ?

Cheers

On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:

> 
> I'm using Hadoop 2.0.0 and  not  run fsck.  
> only one regionserver have these dfs logs,   strange.
> 
> Thanks
> CC: user@hadoop.apache.org
> From: yuzhihong@gmail.com
> Subject: Re: write to most datanode fail quickly
> Date: Tue, 14 Oct 2014 02:43:26 -0700
> To: user@hadoop.apache.org
> 
> Which Hadoop release are you using ?
> 
> Have you run fsck ?
> 
> Cheers
> 
> On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:
> 
> Hi
>     I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
> 
> logs like this:
>     
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
> 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:50010
> 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)    
>     
>     
>     then serveral  "firstBadLink error "
>     2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
>     
>     
>     then serveral "Failed to add a datanode"
>     2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing
> java.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
> 
>     the full log is in http://paste2.org/xfn16jm2
>     
>     Any suggestion will be appreciated. Thanks.

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  
 		 	   		  

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  
 		 	   		  

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  
 		 	   		  

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  
 		 	   		  

Re: write to most datanode fail quickly

Posted by Ted Yu <yu...@gmail.com>.
Which Hadoop release are you using ?

Have you run fsck ?

Cheers

On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:

> Hi
>     I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
> 
> logs like this:
>     
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
> 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:50010
> 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)    
>     
>     
>     then serveral  "firstBadLink error "
>     2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
>     
>     
>     then serveral "Failed to add a datanode"
>     2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing
> java.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
> 
>     the full log is in http://paste2.org/xfn16jm2
>     
>     Any suggestion will be appreciated. Thanks.

Re: write to most datanode fail quickly

Posted by Ted Yu <yu...@gmail.com>.
Which Hadoop release are you using ?

Have you run fsck ?

Cheers

On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:

> Hi
>     I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
> 
> logs like this:
>     
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
> 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:50010
> 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)    
>     
>     
>     then serveral  "firstBadLink error "
>     2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
>     
>     
>     then serveral "Failed to add a datanode"
>     2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing
> java.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
> 
>     the full log is in http://paste2.org/xfn16jm2
>     
>     Any suggestion will be appreciated. Thanks.

Re: write to most datanode fail quickly

Posted by Ted Yu <yu...@gmail.com>.
Which Hadoop release are you using ?

Have you run fsck ?

Cheers

On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:

> Hi
>     I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
> 
> logs like this:
>     
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
> 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:50010
> 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)    
>     
>     
>     then serveral  "firstBadLink error "
>     2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
>     
>     
>     then serveral "Failed to add a datanode"
>     2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing
> java.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
> 
>     the full log is in http://paste2.org/xfn16jm2
>     
>     Any suggestion will be appreciated. Thanks.

Re: write to most datanode fail quickly

Posted by Ted Yu <yu...@gmail.com>.
Which Hadoop release are you using ?

Have you run fsck ?

Cheers

On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:

> Hi
>     I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
> 
> logs like this:
>     
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
> 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:50010
> 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)    
>     
>     
>     then serveral  "firstBadLink error "
>     2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
>     
>     
>     then serveral "Failed to add a datanode"
>     2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncing
> java.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
> 
>     the full log is in http://paste2.org/xfn16jm2
>     
>     Any suggestion will be appreciated. Thanks.