You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by charan kumar <ch...@gmail.com> on 2011/02/05 01:15:29 UTC

Region server shutdown during writes (bad data nodes)

Hello,

   We are running into a Region server shutdown again during write loads (90
clients) , with Connection rest by peer issue? Any suggestions..

  Setup: 30 Nodes. Hbase 0.90.0, Hadoop-append , CentOS, dell 1950 6G RAM.

  2011-02-04 02:36:16,808 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception  for block
blk_-4303650603271778933_2022254java.io.IOException: Connection reset by
peer
        at sun.nio.ch.FileDispatcher.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
        at sun.nio.ch.IOUtil.read(IOUtil.java:206)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
        at
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.DataInputStream.readFully(DataInputStream.java:178)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:122)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2547)

2011-02-04 02:36:16,809 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_-4303650603271778933_2022254 bad datanode[0]
10.76.99.115:50010
2011-02-04 02:36:16,880 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
serverName=nafhdi12708mwh.io.askjeeves.info,60020,1296782105691,
load=(requests=63, regions=165, usedHeap=2274, maxHeap=4070): Failed open of
daughter
compresstable,\x074k\xB6\x91\xC6\x98\x87,1296815758006.86cf8a61169de38e7ea72fb01c351eb1.
java.io.IOException: All datanodes XXXXXXXXXXXX:50010 are bad. Aborting...
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2680)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2172)
        at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2371)

hdfs config:

        <property>
          <name>dfs.datanode.max.xcievers</name>
          <value>4096</value>
        </property>

        <property>
                <name>dfs.replication</name>
                 <value>3</value>
        </property>

        <property>
                <name>dfs.datanode.du.reserved</name>
                <value>5368709120</value>
        </property>

        <property>
                <name>dfs.datanode.handler.count</name>
                <value>100</value>
        </property>
        <property>
                <name>dfs.namenode.handler.count</name>
                <value>100</value>
        </property>

        <property>
                <name>dfs.datanode.socket.write.timeout</name>
                <value>0</value>

GC OPTS: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Xmn256m
-XX:CMSInitiatingOccupancyFraction=70

Thanks,
Charan

Re: Region server shutdown during writes (bad data nodes)

Posted by Stack <st...@duboce.net>.
Please put up more from that log so we can see more around this failed
region open.  Can you check out the datanode to its side.  Does it
have errors?  Is it 'peer' referred to below? (Usually there is the
address who we are talking to).  Pastebin it all.  Thanks.
St.Ack

On Fri, Feb 4, 2011 at 4:15 PM, charan kumar <ch...@gmail.com> wrote:
> Hello,
>
>   We are running into a Region server shutdown again during write loads (90
> clients) , with Connection rest by peer issue? Any suggestions..
>
>  Setup: 30 Nodes. Hbase 0.90.0, Hadoop-append , CentOS, dell 1950 6G RAM.
>
>  2011-02-04 02:36:16,808 WARN org.apache.hadoop.hdfs.DFSClient:
> DFSOutputStream ResponseProcessor exception  for block
> blk_-4303650603271778933_2022254java.io.IOException: Connection reset by
> peer
>        at sun.nio.ch.FileDispatcher.read0(Native Method)
>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
>        at sun.nio.ch.IOUtil.read(IOUtil.java:206)
>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
>        at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
>        at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>        at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>        at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>        at java.io.DataInputStream.readFully(DataInputStream.java:178)
>        at java.io.DataInputStream.readLong(DataInputStream.java:399)
>        at
> org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:122)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2547)
>
> 2011-02-04 02:36:16,809 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block blk_-4303650603271778933_2022254 bad datanode[0]
> 10.76.99.115:50010
> 2011-02-04 02:36:16,880 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server
> serverName=nafhdi12708mwh.io.askjeeves.info,60020,1296782105691,
> load=(requests=63, regions=165, usedHeap=2274, maxHeap=4070): Failed open of
> daughter
> compresstable,\x074k\xB6\x91\xC6\x98\x87,1296815758006.86cf8a61169de38e7ea72fb01c351eb1.
> java.io.IOException: All datanodes XXXXXXXXXXXX:50010 are bad. Aborting...
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2680)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2172)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2371)
>
> hdfs config:
>
>        <property>
>          <name>dfs.datanode.max.xcievers</name>
>          <value>4096</value>
>        </property>
>
>        <property>
>                <name>dfs.replication</name>
>                 <value>3</value>
>        </property>
>
>        <property>
>                <name>dfs.datanode.du.reserved</name>
>                <value>5368709120</value>
>        </property>
>
>        <property>
>                <name>dfs.datanode.handler.count</name>
>                <value>100</value>
>        </property>
>        <property>
>                <name>dfs.namenode.handler.count</name>
>                <value>100</value>
>        </property>
>
>        <property>
>                <name>dfs.datanode.socket.write.timeout</name>
>                <value>0</value>
>
> GC OPTS: -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -Xmn256m
> -XX:CMSInitiatingOccupancyFraction=70
>
> Thanks,
> Charan
>