You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Tuan Nguyen <tu...@gmail.com> on 2010/03/20 12:01:19 UTC

DataNode stop reclaim the deleting block under heavy write

Hi,

We are running stress test to evaluate the hbase. The test run fine and
complete. But we have a small problem with one node. Here is our
configuration and problems:

1. We have 1 master and 4 slaves. the master is used for both namenode and
hbase master server. The slaves are used for both datanode and region
server.
2. We have set xceivier to 8192 and enable the lzo compression.
3. From another machine, we create 8 threads to write the data into the
cluster, each record is about 5kb to 100kb.
4. The test run fine for the first 2 hours - 3 hours, but then one of node
get the following warning:

2010-03-19 20:26:22,814 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_9088042710721149043_145344 file
/mnt/moom/hadoop/0.20.1/dfs/data/current/subdir6/subdir33/blk_90880427107211490432010-03-19
20:26:22,846
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing
datanode Commandjava.io.IOException: Error in deleting blocks.
at
org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:1361)

at
org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:868)

at
org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:830)

at
org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:710)

at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1186)

at java.lang.Thread.run(Thread.java:619)

5. After the warning, I do not see the info Deleting block
blk_xxxxxxxxxxxxxxxxxxxxxxxxx message on this node anymore and we loose the
disk space very fast on this datanode. I guess because the hbase compact the
region and delete the old region, but the datanode is unable to reclaim
the free block.

6. After 5 - 6 hours, the datanode is completely run out of the space, but
the test is continue running at slower insert rate.
7. The entire test finish after 14 hours.
8. Right after the test finish, this datanode start resume reclaim the
deleting blocks.

We run the test twice and the same problem occurs on the same node. I am
wonder what is the possible reason that cause our problem and any
configuration parameter we can tune to fix the problem

Thank for your help!
Tuan Nguyen.

Re: DataNode stop reclaim the deleting block under heavy write

Posted by Tuan Nguyen <tu...@gmail.com>.

Hi,

We are are using the latest release hadoop 0.20.2 and hbase 0.20.3. Thank
you  for  your help.

Tuan Nguyen.

On Sat, Mar 20, 2010 at 8:44 PM, Ted Yu <yu...@gmail.com> wrote:

> What hbase version are you using ?
>
> On Saturday, March 20, 2010, Tuan Nguyen <tu...@gmail.com> wrote:
> > Hi,
> >
> > We are running stress test to evaluate the hbase.  The test  run fine and
> > complete. But we have a small problem with one node.  Here is our
> > configuration and problems:
> >
> > 1. We have 1 master and 4 slaves. the master is used for both namenode
> and
> > hbase master server.  The slaves are used for both datanode and region
> > server.
> > 2.  We have set  xceivier to 8192 and enable the lzo compression.
> > 3. From another machine,  we create 8 threads to write the data into the
> > cluster,  each record is about 5kb to 100kb.
> > 4. The test run fine for  the first 2 hours - 3 hours, but then one of
> node
> >  get the following warning:
> >
> > 2010-03-19 20:26:22,814 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
> > blk_9088042710721149043_145344 file
> >
> /mnt/moom/hadoop/0.20.1/dfs/data/current/subdir6/subdir33/blk_90880427107211490432010-03-19
> > 20:26:22,846
> > WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing
> > datanode Commandjava.io.IOException: Error in deleting blocks.
> > at
> >
> org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:1361)
> >
> >  at
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:868)
> >
> > at
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:830)
> >
> > at
> >
> org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:710)
> >
> > at
> org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1186)
> >
> > at java.lang.Thread.run(Thread.java:619)
> >
> > 5. After the warning,  I do not see the info  Deleting block
> > blk_xxxxxxxxxxxxxxxxxxxxxxxxx  message on this node anymore and we loose
> the
> > disk space very fast on this datanode. I guess because the hbase compact
> the
> > region and delete the old region,  but  the datanode is unable to reclaim
> > the free block.
> >
> > 6. After 5 - 6 hours, the datanode is completely run out of the space,
> but
> > the test is continue running at slower insert rate.
> > 7. The entire test finish after 14 hours.
> > 8. Right after the test finish, this datanode start resume reclaim the
> > deleting blocks.
> >
> > We run the test twice and the same problem occurs on the same node.  I am
> > wonder what is the possible reason that cause our problem and any
> > configuration parameter we can tune to fix the problem
> >
> > Thank for your help!
> > Tuan Nguyen.
> >
>