You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Martin Fiala <fi...@gmail.com> on 2011/01/24 13:21:19 UTC

DFS rebalancing with running HBase

Hello,

in one old thread regarding hadoop/hbase 0.19.x Andrew Purtell wrote, 
that running DFS balancer while HBase is running, is not recommended. I 
didn't find any remarks about this in Hadoop or HBase documentation.

http://mail-archives.apache.org/mod_mbox/hbase-user/200905.mbox/%3c812604.43615.qm@web65510.mail.ac4.yahoo.com%3e

Is it still the case? What bad things can happen?

It is quite clear, that with writing heavily to HBase and running 
balancer simultaneously, the cluster is not going to be balanced. It can 
become even more unbalanced.
What about running balancer when we are only reading from HBase or 
writing small amounts of records?

Regards,
Martin Fiala

Re: DFS rebalancing with running HBase

Posted by Andrew Purtell <ap...@apache.org>.
> This is fixed in CDH3 via HDFS-630: https://issues.apache.org/jira/browse/HDFS-630

My bad that's HDFS-611: https://issues.apache.org/jira/browse/HDFS-611

Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back.
  - Piet Hein (via Tom White)


--- On Mon, 1/24/11, Andrew Purtell <ap...@apache.org> wrote:

> From: Andrew Purtell <ap...@apache.org>
> Subject: Re: DFS rebalancing with running HBase
> To: user@hbase.apache.org
> Date: Monday, January 24, 2011, 5:42 AM
> Martin,
> 
> The trouble was due to a defect in how HDFS managed
> partitioning deletion work among the datanodes. Especially
> when under high write load, HBase can post a lot of deletes
> due to compactions. Running the balancer just makes it worse
> -- additional replications into the face of uneven deletion
> just brings the end faster when a datanode fills. 
> 
> This is fixed in CDH3 via HDFS-630: https://issues.apache.org/jira/browse/HDFS-630
> 
> This is fixed in HDFS 0.21 + via HADOOP-5124: https://issues.apache.org/jira/browse/HADOOP-5124
> 
> All,
> 
> It might be a good idea to apply one of these fixes to the
> ASF 0.20-append branch.
> 
> Best regards,
> 
>     - Andy
> 
> Problems worthy of attack prove their worth by hitting
> back.
>   - Piet Hein (via Tom White)
> 
> 
> --- On Mon, 1/24/11, Martin Fiala <fi...@gmail.com>
> wrote:
> 
> > From: Martin Fiala <fi...@gmail.com>
> > Subject: DFS rebalancing with running HBase
> > To: user@hbase.apache.org
> > Date: Monday, January 24, 2011, 4:21 AM
> > Hello,
> > 
> > in one old thread regarding hadoop/hbase 0.19.x
> Andrew
> > Purtell wrote, that running DFS balancer while HBase
> is
> > running, is not recommended. I didn't find any remarks
> about
> > this in Hadoop or HBase documentation.
> > 
> > http://mail-archives.apache.org/mod_mbox/hbase-user/200905.mbox/%3c812604.43615.qm@web65510.mail.ac4.yahoo.com%3e
> > 
> > Is it still the case? What bad things can happen?
> > 
> > It is quite clear, that with writing heavily to HBase
> and
> > running balancer simultaneously, the cluster is not
> going to
> > be balanced. It can become even more unbalanced.
> > What about running balancer when we are only reading
> from
> > HBase or writing small amounts of records?
> > 
> > Regards,
> > Martin Fiala
> > 
> 
> 
>       
> 


      

Re: DFS rebalancing with running HBase

Posted by Andrew Purtell <ap...@apache.org>.
Martin,

The trouble was due to a defect in how HDFS managed partitioning deletion work among the datanodes. Especially when under high write load, HBase can post a lot of deletes due to compactions. Running the balancer just makes it worse -- additional replications into the face of uneven deletion just brings the end faster when a datanode fills. 

This is fixed in CDH3 via HDFS-630: https://issues.apache.org/jira/browse/HDFS-630

This is fixed in HDFS 0.21 + via HADOOP-5124: https://issues.apache.org/jira/browse/HADOOP-5124

All,

It might be a good idea to apply one of these fixes to the ASF 0.20-append branch.

Best regards,

    - Andy

Problems worthy of attack prove their worth by hitting back.
  - Piet Hein (via Tom White)


--- On Mon, 1/24/11, Martin Fiala <fi...@gmail.com> wrote:

> From: Martin Fiala <fi...@gmail.com>
> Subject: DFS rebalancing with running HBase
> To: user@hbase.apache.org
> Date: Monday, January 24, 2011, 4:21 AM
> Hello,
> 
> in one old thread regarding hadoop/hbase 0.19.x Andrew
> Purtell wrote, that running DFS balancer while HBase is
> running, is not recommended. I didn't find any remarks about
> this in Hadoop or HBase documentation.
> 
> http://mail-archives.apache.org/mod_mbox/hbase-user/200905.mbox/%3c812604.43615.qm@web65510.mail.ac4.yahoo.com%3e
> 
> Is it still the case? What bad things can happen?
> 
> It is quite clear, that with writing heavily to HBase and
> running balancer simultaneously, the cluster is not going to
> be balanced. It can become even more unbalanced.
> What about running balancer when we are only reading from
> HBase or writing small amounts of records?
> 
> Regards,
> Martin Fiala
>