You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ken Weiner <ke...@gumgum.com> on 2009/12/23 20:10:00 UTC
Recurring Hadoop DataNode ERROR in logs
We have seen the following HADOOP error occur about 100 times a day spread
out thoughout the day on each RegionServer/DataNode in our always-on
HBase/Hadoop cluster.
>From *hadoop-gumgum-datanode-xxxxxxxxxxxx.log*
*2009-12-23* *09:58:29*,*717* *ERROR*
*org.apache.hadoop.hdfs.server.datanode.DataNode:*
*DatanodeRegistration*(*10.255.9.187:50010*,
*storageID=DS-1057956046-10.255.9.187-50010-1248395287725*,
*infoPort=50075*, *ipcPort=50020*)*:DataXceiver*
*java.net.SocketTimeoutException:* *480000* *millis* *timeout* *while*
*waiting* *for* *channel* *to* *be* *ready* *for* *write.* *ch* *:*
*java.nio.channels.SocketChannel*[*connected*
*local=/10.255.9.187:50010* *remote=/10.255.9.187:46154*]
*at* *org.apache.hadoop.net.SocketIOWithTimeout.waitForIO*(*SocketIOWithTimeout.java:246*)
*at* *org.apache.hadoop.net.SocketOutputStream.waitForWritable*(*SocketOutputStream.java:159*)
*at* *org.apache.hadoop.net.SocketOutputStream.transferToFully*(*SocketOutputStream.java:198*)
*at* *org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks*(*BlockSender.java:313*)
*at* *org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock*(*BlockSender.java:400*)
*at* *org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock*(*DataXceiver.java:180*)
*at* *org.apache.hadoop.hdfs.server.datanode.DataXceiver.run*(*DataXceiver.java:95*)
*at* *java.lang.Thread.run*(*Thread.java:619*)
Are other people seeing this error too? How serious is it? Can it be
prevented?
I found a few things that seem related, but I'm not sure how they apply to
the HBase environment:
http://issues.apache.org/jira/browse/HDFS-693
https://issues.apache.org/jira/browse/HADOOP-3831
Info on our environment:
1 Node: Master/NameNode/JobTracker (EC2 m1.large)
3 Nodes: RegionServer/DataNode/TaskTracker (EC2 m1.large)
Thanks!
-Ken Weiner
GumGum & BEDROCK
Re: Recurring Hadoop DataNode ERROR in logs
Posted by stack <st...@duboce.net>.
On Wed, Dec 23, 2009 at 2:24 PM, Ken Weiner <ke...@gumgum.com> wrote:
> HBase itself isn't complaining. I only see the error from Hadoop. I wasn't
> sure if there are some side effects I'm not aware of. Is it causing any
> problems to my application?
IIUC, nothing but some extra latency while the dropped socket is
reestablished.
> I see the error occur even when my system is
> under relatively light load with no MR jobs running. I also forgot to
> mention that I saw this error with HBase 0.20.0/Hadoop 0.20.0 and I
> recently
> upgraded to HBase 0.20.2/Hadoop 0.20.1 and the error still occurs at the
> same rate after the upgrade.
>
> Yeah. This would not be surprising. This is hdfs behavior.
You could up the default timeout in hdfs if you wanted less
incidence: dfs.datanode.socket.write.timeout. But, then sockets would never
timeout.
St.Ack
> Ken
>
> On Wed, Dec 23, 2009 at 11:49 AM, stack <st...@duboce.net> wrote:
>
> > Yeah, see HADOOP-3831. It looks like datanode timing out unused
> > connections. As I understand it, later, when dfsclient wants to use this
> > block, it just sets up the socket again -- silently, transparently below
> > the
> > level at which the application can see. Do I have it right? Is hbase
> > itself complaining?
> >
> > St.Ack
> >
> >
>
Re: Recurring Hadoop DataNode ERROR in logs
Posted by Ken Weiner <ke...@gumgum.com>.
HBase itself isn't complaining. I only see the error from Hadoop. I wasn't
sure if there are some side effects I'm not aware of. Is it causing any
problems to my application? I see the error occur even when my system is
under relatively light load with no MR jobs running. I also forgot to
mention that I saw this error with HBase 0.20.0/Hadoop 0.20.0 and I recently
upgraded to HBase 0.20.2/Hadoop 0.20.1 and the error still occurs at the
same rate after the upgrade.
Ken
On Wed, Dec 23, 2009 at 11:49 AM, stack <st...@duboce.net> wrote:
> Yeah, see HADOOP-3831. It looks like datanode timing out unused
> connections. As I understand it, later, when dfsclient wants to use this
> block, it just sets up the socket again -- silently, transparently below
> the
> level at which the application can see. Do I have it right? Is hbase
> itself complaining?
>
> St.Ack
>
>
Re: Recurring Hadoop DataNode ERROR in logs
Posted by stack <st...@duboce.net>.
Yeah, see HADOOP-3831. It looks like datanode timing out unused
connections. As I understand it, later, when dfsclient wants to use this
block, it just sets up the socket again -- silently, transparently below the
level at which the application can see. Do I have it right? Is hbase
itself complaining?
St.Ack
On Wed, Dec 23, 2009 at 11:10 AM, Ken Weiner <ke...@gumgum.com> wrote:
> We have seen the following HADOOP error occur about 100 times a day spread
> out thoughout the day on each RegionServer/DataNode in our always-on
> HBase/Hadoop cluster.
>
> From *hadoop-gumgum-datanode-xxxxxxxxxxxx.log*
>
> *2009-12-23* *09:58:29*,*717* *ERROR*
> *org.apache.hadoop.hdfs.server.datanode.DataNode:*
> *DatanodeRegistration*(*10.255.9.187:50010*,
> *storageID=DS-1057956046-10.255.9.187-50010-1248395287725*,
> *infoPort=50075*, *ipcPort=50020*)*:DataXceiver*
> *java.net.SocketTimeoutException:* *480000* *millis* *timeout* *while*
> *waiting* *for* *channel* *to* *be* *ready* *for* *write.* *ch* *:*
> *java.nio.channels.SocketChannel*[*connected*
> *local=/10.255.9.187:50010* *remote=/10.255.9.187:46154*]
> *at*
> *org.apache.hadoop.net.SocketIOWithTimeout.waitForIO*(*SocketIOWithTimeout.java:246*)
> *at*
> *org.apache.hadoop.net.SocketOutputStream.waitForWritable*(*SocketOutputStream.java:159*)
> *at*
> *org.apache.hadoop.net.SocketOutputStream.transferToFully*(*SocketOutputStream.java:198*)
> *at*
> *org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks*(*BlockSender.java:313*)
> *at*
> *org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock*(*BlockSender.java:400*)
> *at*
> *org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock*(*DataXceiver.java:180*)
> *at*
> *org.apache.hadoop.hdfs.server.datanode.DataXceiver.run*(*DataXceiver.java:95*)
> *at* *java.lang.Thread.run*(*Thread.java:619*)
>
>
> Are other people seeing this error too? How serious is it? Can it be
> prevented?
>
> I found a few things that seem related, but I'm not sure how they apply to
> the HBase environment:
> http://issues.apache.org/jira/browse/HDFS-693
> https://issues.apache.org/jira/browse/HADOOP-3831
>
> Info on our environment:
> 1 Node: Master/NameNode/JobTracker (EC2 m1.large)
> 3 Nodes: RegionServer/DataNode/TaskTracker (EC2 m1.large)
>
> Thanks!
>
> -Ken Weiner
> GumGum & BEDROCK
>
Re: Recurring Hadoop DataNode ERROR in logs
Posted by Lars George <la...@gmail.com>.
Hi Ken,
Yes I have the same on a smallish cluster under heavy MR based import
load. I am discussing with JD - maybe it is HDFS-101 related. Some
needs to apply the patch and try.
Lars
On Dec 23, 2009, at 20:10, Ken Weiner <ke...@gumgum.com> wrote:
> We have seen the following HADOOP error occur about 100 times a day
> spread
> out thoughout the day on each RegionServer/DataNode in our always-on
> HBase/Hadoop cluster.
>
> From *hadoop-gumgum-datanode-xxxxxxxxxxxx.log*
>
> *2009-12-23* *09:58:29*,*717* *ERROR*
> *org.apache.hadoop.hdfs.server.datanode.DataNode:*
> *DatanodeRegistration*(*10.255.9.187:50010*,
> *storageID=DS-1057956046-10.255.9.187-50010-1248395287725*,
> *infoPort=50075*, *ipcPort=50020*)*:DataXceiver*
> *java.net.SocketTimeoutException:* *480000* *millis* *timeout* *while*
> *waiting* *for* *channel* *to* *be* *ready* *for* *write.* *ch* *:*
> *java.nio.channels.SocketChannel*[*connected*
> *local=/10.255.9.187:50010* *remote=/10.255.9.187:46154*]
> *at* *org.apache.hadoop.net.SocketIOWithTimeout.waitForIO*
> (*SocketIOWithTimeout.java:246*)
> *at* *org.apache.hadoop.net.SocketOutputStream.waitForWritable*
> (*SocketOutputStream.java:159*)
> *at* *org.apache.hadoop.net.SocketOutputStream.transferToFully*
> (*SocketOutputStream.java:198*)
> *at*
> *org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks*
> (*BlockSender.java:313*)
> *at*
> *org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock*
> (*BlockSender.java:400*)
> *at*
> *org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock*
> (*DataXceiver.java:180*)
> *at* *org.apache.hadoop.hdfs.server.datanode.DataXceiver.run*
> (*DataXceiver.java:95*)
> *at* *java.lang.Thread.run*(*Thread.java:619*)
>
>
> Are other people seeing this error too? How serious is it? Can it be
> prevented?
>
> I found a few things that seem related, but I'm not sure how they
> apply to
> the HBase environment:
> http://issues.apache.org/jira/browse/HDFS-693
> https://issues.apache.org/jira/browse/HADOOP-3831
>
> Info on our environment:
> 1 Node: Master/NameNode/JobTracker (EC2 m1.large)
> 3 Nodes: RegionServer/DataNode/TaskTracker (EC2 m1.large)
>
> Thanks!
>
> -Ken Weiner
> GumGum & BEDROCK