You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Martin Fiala <fi...@gmail.com> on 2011/01/24 13:21:19 UTC
DFS rebalancing with running HBase
Hello,
in one old thread regarding hadoop/hbase 0.19.x Andrew Purtell wrote,
that running DFS balancer while HBase is running, is not recommended. I
didn't find any remarks about this in Hadoop or HBase documentation.
http://mail-archives.apache.org/mod_mbox/hbase-user/200905.mbox/%3c812604.43615.qm@web65510.mail.ac4.yahoo.com%3e
Is it still the case? What bad things can happen?
It is quite clear, that with writing heavily to HBase and running
balancer simultaneously, the cluster is not going to be balanced. It can
become even more unbalanced.
What about running balancer when we are only reading from HBase or
writing small amounts of records?
Regards,
Martin Fiala
Re: DFS rebalancing with running HBase
Posted by Andrew Purtell <ap...@apache.org>.
> This is fixed in CDH3 via HDFS-630: https://issues.apache.org/jira/browse/HDFS-630
My bad that's HDFS-611: https://issues.apache.org/jira/browse/HDFS-611
Best regards,
- Andy
Problems worthy of attack prove their worth by hitting back.
- Piet Hein (via Tom White)
--- On Mon, 1/24/11, Andrew Purtell <ap...@apache.org> wrote:
> From: Andrew Purtell <ap...@apache.org>
> Subject: Re: DFS rebalancing with running HBase
> To: user@hbase.apache.org
> Date: Monday, January 24, 2011, 5:42 AM
> Martin,
>
> The trouble was due to a defect in how HDFS managed
> partitioning deletion work among the datanodes. Especially
> when under high write load, HBase can post a lot of deletes
> due to compactions. Running the balancer just makes it worse
> -- additional replications into the face of uneven deletion
> just brings the end faster when a datanode fills.
>
> This is fixed in CDH3 via HDFS-630: https://issues.apache.org/jira/browse/HDFS-630
>
> This is fixed in HDFS 0.21 + via HADOOP-5124: https://issues.apache.org/jira/browse/HADOOP-5124
>
> All,
>
> It might be a good idea to apply one of these fixes to the
> ASF 0.20-append branch.
>
> Best regards,
>
> - Andy
>
> Problems worthy of attack prove their worth by hitting
> back.
> - Piet Hein (via Tom White)
>
>
> --- On Mon, 1/24/11, Martin Fiala <fi...@gmail.com>
> wrote:
>
> > From: Martin Fiala <fi...@gmail.com>
> > Subject: DFS rebalancing with running HBase
> > To: user@hbase.apache.org
> > Date: Monday, January 24, 2011, 4:21 AM
> > Hello,
> >
> > in one old thread regarding hadoop/hbase 0.19.x
> Andrew
> > Purtell wrote, that running DFS balancer while HBase
> is
> > running, is not recommended. I didn't find any remarks
> about
> > this in Hadoop or HBase documentation.
> >
> > http://mail-archives.apache.org/mod_mbox/hbase-user/200905.mbox/%3c812604.43615.qm@web65510.mail.ac4.yahoo.com%3e
> >
> > Is it still the case? What bad things can happen?
> >
> > It is quite clear, that with writing heavily to HBase
> and
> > running balancer simultaneously, the cluster is not
> going to
> > be balanced. It can become even more unbalanced.
> > What about running balancer when we are only reading
> from
> > HBase or writing small amounts of records?
> >
> > Regards,
> > Martin Fiala
> >
>
>
>
>
Re: DFS rebalancing with running HBase
Posted by Andrew Purtell <ap...@apache.org>.
Martin,
The trouble was due to a defect in how HDFS managed partitioning deletion work among the datanodes. Especially when under high write load, HBase can post a lot of deletes due to compactions. Running the balancer just makes it worse -- additional replications into the face of uneven deletion just brings the end faster when a datanode fills.
This is fixed in CDH3 via HDFS-630: https://issues.apache.org/jira/browse/HDFS-630
This is fixed in HDFS 0.21 + via HADOOP-5124: https://issues.apache.org/jira/browse/HADOOP-5124
All,
It might be a good idea to apply one of these fixes to the ASF 0.20-append branch.
Best regards,
- Andy
Problems worthy of attack prove their worth by hitting back.
- Piet Hein (via Tom White)
--- On Mon, 1/24/11, Martin Fiala <fi...@gmail.com> wrote:
> From: Martin Fiala <fi...@gmail.com>
> Subject: DFS rebalancing with running HBase
> To: user@hbase.apache.org
> Date: Monday, January 24, 2011, 4:21 AM
> Hello,
>
> in one old thread regarding hadoop/hbase 0.19.x Andrew
> Purtell wrote, that running DFS balancer while HBase is
> running, is not recommended. I didn't find any remarks about
> this in Hadoop or HBase documentation.
>
> http://mail-archives.apache.org/mod_mbox/hbase-user/200905.mbox/%3c812604.43615.qm@web65510.mail.ac4.yahoo.com%3e
>
> Is it still the case? What bad things can happen?
>
> It is quite clear, that with writing heavily to HBase and
> running balancer simultaneously, the cluster is not going to
> be balanced. It can become even more unbalanced.
> What about running balancer when we are only reading from
> HBase or writing small amounts of records?
>
> Regards,
> Martin Fiala
>