You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Steve Kuo <ku...@gmail.com> on 2010/08/07 21:10:12 UTC

DFSClient: DataStreamer Exception: java.io.IOException: Broken pipe

I am running a MR import job to hbase and one out of 8 RS' died with the
following errors in regionserver and datanode logs.  Googling returned a
discussion that indicates setting dfs.datandoe.socket.write.timeout=0 may
help the situation.  Please give me some insight on what this problem is
about and what is the best way to resolve it.

Should dfs.datanode.socket.write.timeout in hbase-site.xml and hdfs-site.xml
so that it is visible to both hadoop and hbase?

=== my setup ===

OS: ubuntu 9.10
JVM: jdk1.6.0_16
hadoop/hbase/zookeeper:    CDH3b2
1 master running hm, nn, snn, jt, zk
8 servers running rs, dn, tt
dfs.datanode.max.xceivers = 2047


=== regionserver.log ===

2010-08-07 08:36:29,706 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Unblocking updates for region table_c,f783045be20e5e1771b292d806ed6e1a 'IPC
Server handler 30 on 60020'
2010-08-07 08:36:45,750 WARN
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region
table_c,f783045be20e5e1771b292d806ed6e1a has too many store files; delaying
flush up to 90000ms
2010-08-07 08:41:40,063 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
Exception: java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcher.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
        at sun.nio.ch.IOUtil.write(IOUtil.java:75)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
        at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run((DFSClient.java:2355)

2010-08-07 08:41:40,332 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_6582199944133003769_34503 bad datanode[0]
10.128.135.106:50010
2010-08-07 08:41:40,332 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_6582199944133003769_34503 in pipeline
10.128.135.106:50010, 10.128.135.107:50010, 10.128.135.104:50010: bad
datanode 10.128.135.106:50010
2010-08-07 08:41:40,333 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception  for block
blk_4265297965963337334_33484java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:119)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2467)

2010-08-07 08:41:40,334 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_4265297965963337334_33484 bad datanode[0]
10.128.135.106:50010
2010-08-07 08:41:40,334 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_4265297965963337334_33484 in pipeline
10.128.135.106:50010, 10.128.135.104:50010, 10.128.135.107:50010: bad
datanode 10.128.135.106:50010
2010-08-07 08:41:40,392 INFO org.apache.zookeeper.ClientCnxn: Client session
timed out, have not heard from server in 332256ms for sessionid
0x12a492f91560115, closing socket connection and attempting reconnect
2010-08-07 08:41:40,461 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGIONSERVER_STOP
2010-08-07 08:41:40,501 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
state: Disconnected, type: None, path: null
2010-08-07 08:41:40,740 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 60020
2010-08-07 08:41:40,741 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 0 on 60020: exiting


=== datanode.log ===

2010-08-07 08:41:41,813 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls
recoverBlock(block=blk_6582199944133003769_34503, targets=[
10.128.135.106:50010, 10.128.135.107:50010])
2010-08-07 08:41:41,813 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: No block metadata found for
block blk_6582199944133003769_34503 on datanode 10.128.135.106:50010
2010-08-07 08:41:41,875 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: No block metadata found for
block blk_6582199944133003769_34503 on datanode 10.128.135.107:50010
2010-08-07 08:41:41,890 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 50020, call recoverBlock(blk_6582199944133003769_34503, false,
[Lorg.apache.hadoop.hdfs.protocol.DatanodeInfo;@927a0de) from
10.128.135.106:40867: error: org.apache.hadoop.ipc.RemoteException:
java.io.IOException: Block (=blk_6582199944133003769_34503) not found
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:1987)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.commitBlockSynchronization(NameNode.java:529)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)

org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block
(=blk_6582199944133003769_34503) not found
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:1987)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.commitBlockSynchronization(NameNode.java:529)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)

Re: DFSClient: DataStreamer Exception: java.io.IOException: Broken pipe

Posted by Jean-Daniel Cryans <jd...@apache.org>.
HDFS errors + this message:

2010-08-07 08:41:40,392 INFO org.apache.zookeeper.ClientCnxn: Client session
timed out, have not heard from server in 332256ms for sessionid

points to a GC pause (that lasted about 5 minutes). Make sure you
aren't swapping, that HBase isn't CPU starved, monitor your cluster
using Ganglia, etc.

J-D

On Sat, Aug 7, 2010 at 12:10 PM, Steve Kuo <ku...@gmail.com> wrote:
> I am running a MR import job to hbase and one out of 8 RS' died with the
> following errors in regionserver and datanode logs.  Googling returned a
> discussion that indicates setting dfs.datandoe.socket.write.timeout=0 may
> help the situation.  Please give me some insight on what this problem is
> about and what is the best way to resolve it.
>
> Should dfs.datanode.socket.write.timeout in hbase-site.xml and hdfs-site.xml
> so that it is visible to both hadoop and hbase?
>
> === my setup ===
>
> OS: ubuntu 9.10
> JVM: jdk1.6.0_16
> hadoop/hbase/zookeeper:    CDH3b2
> 1 master running hm, nn, snn, jt, zk
> 8 servers running rs, dn, tt
> dfs.datanode.max.xceivers = 2047
>
>
> === regionserver.log ===
>
> 2010-08-07 08:36:29,706 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Unblocking updates for region table_c,f783045be20e5e1771b292d806ed6e1a 'IPC
> Server handler 30 on 60020'
> 2010-08-07 08:36:45,750 WARN
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region
> table_c,f783045be20e5e1771b292d806ed6e1a has too many store files; delaying
> flush up to 90000ms
> 2010-08-07 08:41:40,063 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: java.io.IOException: Broken pipe
>        at sun.nio.ch.FileDispatcher.write0(Native Method)
>        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
>        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
>        at sun.nio.ch.IOUtil.write(IOUtil.java:75)
>        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
>        at
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
>        at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>        at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
>        at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
>        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>        at java.io.DataOutputStream.write(DataOutputStream.java:90)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run((DFSClient.java:2355)
>
> 2010-08-07 08:41:40,332 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6582199944133003769_34503 bad datanode[0]
> 10.128.135.106:50010
> 2010-08-07 08:41:40,332 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_6582199944133003769_34503 in pipeline
> 10.128.135.106:50010, 10.128.135.107:50010, 10.128.135.104:50010: bad
> datanode 10.128.135.106:50010
> 2010-08-07 08:41:40,333 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> blk_4265297965963337334_33484java.io.EOFException
>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>        at java.io.DataInputStream.readLong(DataInputStream.java:399)
>        at
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:119)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2467)
>
> 2010-08-07 08:41:40,334 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_4265297965963337334_33484 bad datanode[0]
> 10.128.135.106:50010
> 2010-08-07 08:41:40,334 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_4265297965963337334_33484 in pipeline
> 10.128.135.106:50010, 10.128.135.104:50010, 10.128.135.107:50010: bad
> datanode 10.128.135.106:50010
> 2010-08-07 08:41:40,392 INFO org.apache.zookeeper.ClientCnxn: Client session
> timed out, have not heard from server in 332256ms for sessionid
> 0x12a492f91560115, closing socket connection and attempting reconnect
> 2010-08-07 08:41:40,461 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: MSG_REGIONSERVER_STOP
> 2010-08-07 08:41:40,501 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event,
> state: Disconnected, type: None, path: null
> 2010-08-07 08:41:40,740 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> server on 60020
> 2010-08-07 08:41:40,741 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 0 on 60020: exiting
>
>
> === datanode.log ===
>
> 2010-08-07 08:41:41,813 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Client calls
> recoverBlock(block=blk_6582199944133003769_34503, targets=[
> 10.128.135.106:50010, 10.128.135.107:50010])
> 2010-08-07 08:41:41,813 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: No block metadata found for
> block blk_6582199944133003769_34503 on datanode 10.128.135.106:50010
> 2010-08-07 08:41:41,875 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: No block metadata found for
> block blk_6582199944133003769_34503 on datanode 10.128.135.107:50010
> 2010-08-07 08:41:41,890 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 1 on 50020, call recoverBlock(blk_6582199944133003769_34503, false,
> [Lorg.apache.hadoop.hdfs.protocol.DatanodeInfo;@927a0de) from
> 10.128.135.106:40867: error: org.apache.hadoop.ipc.RemoteException:
> java.io.IOException: Block (=blk_6582199944133003769_34503) not found
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:1987)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.commitBlockSynchronization(NameNode.java:529)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
>
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block
> (=blk_6582199944133003769_34503) not found
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:1987)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.commitBlockSynchronization(NameNode.java:529)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)
>