You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Ted Yu <yu...@gmail.com> on 2014/10/14 19:28:31 UTC

Re: write to most datanode fail quickly

132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you
posted.

I don't see error or exception either.

Perhaps search in wider scope.

On Tue, Oct 14, 2014 at 5:36 AM, sunww <sp...@outlook.com> wrote:

> Hi
>
> dfs.client.read.shortcircuit is true.
>
> this is namenode log at that moment:
> http://paste2.org/U0zDA9ms
>
> It seems like there is no special in namenode log.
>
> Thanks
> ------------------------------
> CC: user@hadoop.apache.org
> From: yuzhihong@gmail.com
> Subject: Re: write to most datanode fail quickly
> Date: Tue, 14 Oct 2014 03:09:24 -0700
>
> To: user@hadoop.apache.org
>
> Can you check NameNode log for 132.228.48.20 ?
>
> Have you turned on short circuit read ?
>
> Cheers
>
> On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:
>
>
> I'm using Hadoop 2.0.0 and  not  run fsck.
> only one regionserver have these dfs logs,   strange.
>
> Thanks
> ------------------------------
> CC: user@hadoop.apache.org
> From: yuzhihong@gmail.com
> Subject: Re: write to most datanode fail quickly
> Date: Tue, 14 Oct 2014 02:43:26 -0700
> To: user@hadoop.apache.org
>
> Which Hadoop release are you using ?
>
> Have you run fsck ?
>
> Cheers
>
> On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:
>
> Hi
>     I'm using hbase with about 20 regionserver. And  one regionserver
> failed to write  most of datanodes quickly, finally cause this
> regionserver die. While other regionserver is ok.
>
> logs like this:
>
> java.io.IOException: Bad response ERROR for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217
> from datanode 132.228.248.20:50010
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
> 2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217
> in pipeline 132.228.248.17:50010, 132.228.248.20:50010,
> 132.228.248.41:50010: bad datanode 132.228.248.20:50010
> 2014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> java.io.IOException: Bad response ERROR for block
> BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415
> from datanode 132.228.248.41:50010
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)
>
>
>
>     then serveral  "firstBadLink error "
>     2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient:
> Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as
> 132.228.248.18:50010
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)
>
>
>     then serveral "Failed to add a datanode"
>     2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error
> while syncing
> java.io.IOException: Failed to add a datanode.  User may turn off this
> feature by setting
> dfs.client.block.write.replace-datanode-on-failure.policy in configuration,
> where the current policy is DEFAULT.  (Nodes: current=[
> 132.228.248.17:50010, 132.228.248.35:50010], original=[
> 132.228.248.17:50010, 132.228.248.35:50010])
>
>     the full log is in http://paste2.org/xfn16jm2
>
>     Any suggestion will be appreciated. Thanks.
>
>

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
Hi    the correct  ip is  132.228.248.20.    I check  hdfs log in  the dead regionserver, it have some error message, maybe it's useful.    http://paste2.org/NwpcaGVv
Thanks

Date: Tue, 14 Oct 2014 10:28:31 -0700
Subject: Re: write to most datanode fail quickly
From: yuzhihong@gmail.com
To: user@hadoop.apache.org

132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you posted.

I don't see error or exception either.
Perhaps search in wider scope.
On Tue, Oct 14, 2014 at 5:36 AM, sunww <sp...@outlook.com> wrote:



Hi
dfs.client.read.shortcircuit is true.
this is namenode log at that moment:http://paste2.org/U0zDA9ms
It seems like there is no special in namenode log. 

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 03:09:24 -0700
To: user@hadoop.apache.org

Can you check NameNode log for 132.228.48.20 ?
Have you turned on short circuit read ?
Cheers
On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:





I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  
 		 	   		  
 		 	   		  

 		 	   		  

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
Hi    the correct  ip is  132.228.248.20.    I check  hdfs log in  the dead regionserver, it have some error message, maybe it's useful.    http://paste2.org/NwpcaGVv
Thanks

Date: Tue, 14 Oct 2014 10:28:31 -0700
Subject: Re: write to most datanode fail quickly
From: yuzhihong@gmail.com
To: user@hadoop.apache.org

132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you posted.

I don't see error or exception either.
Perhaps search in wider scope.
On Tue, Oct 14, 2014 at 5:36 AM, sunww <sp...@outlook.com> wrote:



Hi
dfs.client.read.shortcircuit is true.
this is namenode log at that moment:http://paste2.org/U0zDA9ms
It seems like there is no special in namenode log. 

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 03:09:24 -0700
To: user@hadoop.apache.org

Can you check NameNode log for 132.228.48.20 ?
Have you turned on short circuit read ?
Cheers
On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:





I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  
 		 	   		  
 		 	   		  

 		 	   		  

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
Hi    the correct  ip is  132.228.248.20.    I check  hdfs log in  the dead regionserver, it have some error message, maybe it's useful.    http://paste2.org/NwpcaGVv
Thanks

Date: Tue, 14 Oct 2014 10:28:31 -0700
Subject: Re: write to most datanode fail quickly
From: yuzhihong@gmail.com
To: user@hadoop.apache.org

132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you posted.

I don't see error or exception either.
Perhaps search in wider scope.
On Tue, Oct 14, 2014 at 5:36 AM, sunww <sp...@outlook.com> wrote:



Hi
dfs.client.read.shortcircuit is true.
this is namenode log at that moment:http://paste2.org/U0zDA9ms
It seems like there is no special in namenode log. 

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 03:09:24 -0700
To: user@hadoop.apache.org

Can you check NameNode log for 132.228.48.20 ?
Have you turned on short circuit read ?
Cheers
On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:





I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks. 		 	   		  
 		 	   		  
 		 	   		  

 		 	   		  

RE: write to most datanode fail quickly

Posted by sunww <sp...@outlook.com>.
Hi    the correct  ip is  132.228.248.20.    I check  hdfs log in  the dead regionserver, it have some error message, maybe it's useful.    http://paste2.org/NwpcaGVv
Thanks

Date: Tue, 14 Oct 2014 10:28:31 -0700
Subject: Re: write to most datanode fail quickly
From: yuzhihong@gmail.com
To: user@hadoop.apache.org

132.228.48.20 didn't show up in the snippet (spanning 3 minutes only) you posted.

I don't see error or exception either.
Perhaps search in wider scope.
On Tue, Oct 14, 2014 at 5:36 AM, sunww <sp...@outlook.com> wrote:



Hi
dfs.client.read.shortcircuit is true.
this is namenode log at that moment:http://paste2.org/U0zDA9ms
It seems like there is no special in namenode log. 

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 03:09:24 -0700
To: user@hadoop.apache.org

Can you check NameNode log for 132.228.48.20 ?
Have you turned on short circuit read ?
Cheers
On Oct 14, 2014, at 3:00 AM, sunww <sp...@outlook.com> wrote:





I'm using Hadoop 2.0.0 and  not  run fsck.  only one regionserver have these dfs logs,   strange.

Thanks
CC: user@hadoop.apache.org
From: yuzhihong@gmail.com
Subject: Re: write   to most datanode fail quickly
Date: Tue, 14 Oct 2014 02:43:26 -0700
To: user@hadoop.apache.org

Which Hadoop release are you using ?
Have you run fsck ?
Cheers
On Oct 14, 2014, at 2:31 AM, sunww <sp...@outlook.com> wrote:




Hi    I'm using hbase with about 20 regionserver. And  one regionserver failed to write  most of datanodes quickly, finally cause this regionserver die. While other regionserver is ok. 
logs like this:    java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 from datanode 132.228.248.20:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)2014-10-13 09:23:01,227 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339217 in pipeline 132.228.248.17:50010, 132.228.248.20:50010, 132.228.248.41:50010: bad datanode 132.228.248.20:500102014-10-13 09:23:32,021 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415java.io.IOException: Bad response ERROR for block BP-165080589-132.228.248.11-1371617709677:blk_5069077415583579127_39339415 from datanode 132.228.248.41:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:681)                then serveral  "firstBadLink error "    2014-10-13 09:23:33,390 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStreamjava.io.IOException: Bad connect ack with firstBadLink as 132.228.248.18:50010	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1090)            then serveral "Failed to add a datanode"    2014-10-13 09:23:44,331 WARN org.apache.hadoop.hdfs.DFSClient: Error while syncingjava.io.IOException: Failed to add a datanode.  User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT.  (Nodes: current=[132.228.248.17:50010, 132.228.248.35:50010], original=[132.228.248.17:50010, 132.228.248.35:50010])
    the full log is in http://paste2.org/xfn16jm2        Any suggestion will be appreciated. Thanks.