You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Stuart Smith <st...@yahoo.com> on 2011/03/17 20:13:44 UTC

keeping an active hdfs cluster balanced

Parts of this may end up on the hbase list, but I thought I'd start here. My basic problem is:

My cluster is getting full enough that having one data node go down does put a bit of pressure on the system (when balanced, every DN is more than half full).

I write (and delete) pretty actively to Hbase & some hdfs direct.

The cluster keeps drifting dangerously out of balance.

I run the balancer daily, but:

- I've seen reports that you shouldn't rebalance with regionservers running, yet, I don't really have a choice. Without HBase, my system is pretty much down. If it gets out of balance, it will also come down.

Anybody here have any idea how badly running the balancer on a heavily active system messes things up? (for hdfs/hbase - if anyone knows).

- Possibly somewhat related: I'm seeing more "failed to move block" errors in my balancer logs. It got to the point were I wasn't seeing any effective rebalancing occur. I've turned off access to the cluster and rebalanced (one node was down to 10% free space, a couple others when up to 50 or more). I'm back down to around 20-40% free space on each node (as reported by the hdfs web interface).

How effective is the balancer on a active cluster? Is there any way to make it's life easier, so it can stay in balance with daily runs?

I'm not sure why the one node ends up being so heavily favored, either. The favoritism even seems to survive taking the node down, and bringing it back up. If I can't find the resources to upgrade, I might try that again, but I'm less than hopeful about it.

Any ideas? Or do I just need better hardware? Not sure if that's an option, though..

Take care,
-stu

Re: keeping an active hdfs cluster balanced

Posted by st...@yahoo.com.

Thanks Allen!

This all makes sense. 
I'm already looking into expiring data - and good suggestion with the logs. I could do some things more efficiently data - but I'm not sure if I have any big wins I can pull off.

I'm in the midst of a OS upgrade & hope to switch from Apache to CDH as well. Hopefully I can clean some stuff up in the process.

It does sound like I'm just going to have to find some hardware somewhere..

Take care,
 -stu


-----Original Message-----
From: Allen Wittenauer <aw...@apache.org>
Date: Thu, 17 Mar 2011 14:20:06 
To: <hd...@hadoop.apache.org>
Reply-To: hdfs-user@hadoop.apache.org
Subject: Re: keeping an active hdfs cluster balanced


On Mar 17, 2011, at 12:13 PM, Stuart Smith wrote:

> Parts of this may end up on the hbase list, but I thought I'd start here. My basic problem is:
> 
> My cluster is getting full enough that having one data node go down does put a bit of pressure on the system (when balanced, every DN is more than half full).

	Usually around the ~80% full mark is when HDFS starts getting a bit wonky on super active grids. Your best bet is to either delete some data/store the data more efficiently, add more nodes, or upgrade the storage capacity of the nodes you have.  The balancer is only going to save you for so long until the whole thing tips over.

> Anybody here have any idea how badly running the balancer on a heavily active system messes things up? (for hdfs/hbase - if anyone knows).

	I don't run HBase, but at Y! we used to run the balancer pretty much every day, even on super active grids.  It 'mostly works' until you get to the point of no return, which it sounds like you are heading for...

> Any ideas? Or do I just need better hardware? Not sure if that's an option, though..

	Depending upon how your systems are configured, something else to look at is how much space is getting ate by logs, mapreduce spill space, etc.  A good daemon bounce might free up some stale handles as well.

Re: keeping an active hdfs cluster balanced

Posted by st...@yahoo.com.

Thanks Koji!
Is each node a small percentage of the total space in this case? 

Take care,
 -stu
-----Original Message-----
From: Koji Noguchi <kn...@yahoo-inc.com>
Date: Thu, 24 Mar 2011 11:12:00 
To: hdfs-user@hadoop.apache.org<hd...@hadoop.apache.org>
Reply-To: hdfs-user@hadoop.apache.org
Subject: Re: keeping an active hdfs cluster balanced

Just a note 

> Usually around the ~80% full mark is when HDFS starts getting a bit wonky
>
These days, we have large grids over 90% full and still running fine.
Percentage of hdfs space could be misleading.  We usually monitor the
percentage of full datanodes.

Koji

On 3/17/11 2:20 PM, "Allen Wittenauer" <aw...@apache.org> wrote:

> 
> On Mar 17, 2011, at 12:13 PM, Stuart Smith wrote:
> 
>> Parts of this may end up on the hbase list, but I thought I'd start here. My
>> basic problem is:
>> 
>> My cluster is getting full enough that having one data node go down does put
>> a bit of pressure on the system (when balanced, every DN is more than half
>> full).
> 
> Usually around the ~80% full mark is when HDFS starts getting a bit wonky on
> super active grids. Your best bet is to either delete some data/store the data
> more efficiently, add more nodes, or upgrade the storage capacity of the nodes
> you have.  The balancer is only going to save you for so long until the whole
> thing tips over.
> 
>> Anybody here have any idea how badly running the balancer on a heavily active
>> system messes things up? (for hdfs/hbase - if anyone knows).
> 
> I don't run HBase, but at Y! we used to run the balancer pretty much every
> day, even on super active grids.  It 'mostly works' until you get to the point
> of no return, which it sounds like you are heading for...
> 
>> Any ideas? Or do I just need better hardware? Not sure if that's an option,
>> though..
> 
> Depending upon how your systems are configured, something else to look at is
> how much space is getting ate by logs, mapreduce spill space, etc.  A good
> daemon bounce might free up some stale handles as well.

Re: keeping an active hdfs cluster balanced

Posted by Koji Noguchi <kn...@yahoo-inc.com>.

Just a note 

> Usually around the ~80% full mark is when HDFS starts getting a bit wonky
>
These days, we have large grids over 90% full and still running fine.
Percentage of hdfs space could be misleading.  We usually monitor the
percentage of full datanodes.

Koji

On 3/17/11 2:20 PM, "Allen Wittenauer" <aw...@apache.org> wrote:

> 
> On Mar 17, 2011, at 12:13 PM, Stuart Smith wrote:
> 
>> Parts of this may end up on the hbase list, but I thought I'd start here. My
>> basic problem is:
>> 
>> My cluster is getting full enough that having one data node go down does put
>> a bit of pressure on the system (when balanced, every DN is more than half
>> full).
> 
> Usually around the ~80% full mark is when HDFS starts getting a bit wonky on
> super active grids. Your best bet is to either delete some data/store the data
> more efficiently, add more nodes, or upgrade the storage capacity of the nodes
> you have.  The balancer is only going to save you for so long until the whole
> thing tips over.
> 
>> Anybody here have any idea how badly running the balancer on a heavily active
>> system messes things up? (for hdfs/hbase - if anyone knows).
> 
> I don't run HBase, but at Y! we used to run the balancer pretty much every
> day, even on super active grids.  It 'mostly works' until you get to the point
> of no return, which it sounds like you are heading for...
> 
>> Any ideas? Or do I just need better hardware? Not sure if that's an option,
>> though..
> 
> Depending upon how your systems are configured, something else to look at is
> how much space is getting ate by logs, mapreduce spill space, etc.  A good
> daemon bounce might free up some stale handles as well.

Re: keeping an active hdfs cluster balanced

Posted by Allen Wittenauer <aw...@apache.org>.

On Mar 17, 2011, at 12:13 PM, Stuart Smith wrote:

> Parts of this may end up on the hbase list, but I thought I'd start here. My basic problem is:
> 
> My cluster is getting full enough that having one data node go down does put a bit of pressure on the system (when balanced, every DN is more than half full).

	Usually around the ~80% full mark is when HDFS starts getting a bit wonky on super active grids. Your best bet is to either delete some data/store the data more efficiently, add more nodes, or upgrade the storage capacity of the nodes you have.  The balancer is only going to save you for so long until the whole thing tips over.

> Anybody here have any idea how badly running the balancer on a heavily active system messes things up? (for hdfs/hbase - if anyone knows).

	I don't run HBase, but at Y! we used to run the balancer pretty much every day, even on super active grids.  It 'mostly works' until you get to the point of no return, which it sounds like you are heading for...

> Any ideas? Or do I just need better hardware? Not sure if that's an option, though..

	Depending upon how your systems are configured, something else to look at is how much space is getting ate by logs, mapreduce spill space, etc.  A good daemon bounce might free up some stale handles as well.

Re: keeping an active hdfs cluster balanced

Posted by st...@yahoo.com.

Hello Ted,

I have a small, 8 DN cluster, 6 of which are regionservers. Some have 3TB, others have 2TB. All have all disks available to hdfs - including the OS/system disk :|

The majority of the data goes to HBase, which then writes to hdfs. Some data is written to hdfs via thrift.

Take care,
 -stu
-----Original Message-----
From: Ted Dunning <td...@maprtech.com>
Date: Thu, 17 Mar 2011 14:23:10 
To: <hd...@hadoop.apache.org>
Reply-To: hdfs-user@hadoop.apache.org
Cc: Stuart Smith<st...@yahoo.com>
Subject: Re: keeping an active hdfs cluster balanced

How large a cluster?

How large is each data-node?  How much disk is devoted to hbase?

How does your HDFS data arrive?  From one or a few machines in the cluster?
 From outside the cluster?

On Thu, Mar 17, 2011 at 12:13 PM, Stuart Smith <st...@yahoo.com> wrote:

> Parts of this may end up on the hbase list, but I thought I'd start here.
> My basic problem is:
>
> My cluster is getting full enough that having one data node go down does
> put a bit of pressure on the system (when balanced, every DN is more than
> half full).
>
> I write (and delete) pretty actively to Hbase & some hdfs direct.
>
> The cluster keeps drifting dangerously out of balance.
>
> I run the balancer daily, but:
>
>   - I've seen reports that you shouldn't rebalance with regionservers
> running, yet, I don't really have a choice. Without HBase, my system is
> pretty much down. If it gets out of balance, it will also come down.
>
>  Anybody here have any idea how badly running the balancer on a heavily
> active system messes things up? (for hdfs/hbase - if anyone knows).
>
>   - Possibly somewhat related: I'm seeing more "failed to move block"
> errors in my balancer logs. It got to the point were I wasn't seeing any
> effective rebalancing occur. I've turned off access to the cluster and
> rebalanced (one node was down to 10% free space, a couple others when up to
> 50 or more). I'm back down to around 20-40% free space on each node (as
> reported by the hdfs web interface).
>
>    How effective is the balancer on a active cluster? Is there any way to
> make it's life easier, so it can stay in balance with daily runs?
>
> I'm not sure why the one node ends up being so heavily favored, either. The
> favoritism even seems to survive taking the node down, and bringing it back
> up. If I can't find the resources to upgrade, I might try that again, but
> I'm less than hopeful about it.
>
> Any ideas? Or do I just need better hardware? Not sure if that's an option,
> though..
>
> Take care,
>  -stu
>
>
>
>

Re: keeping an active hdfs cluster balanced

Posted by Ted Dunning <td...@maprtech.com>.

How large a cluster?

How large is each data-node?  How much disk is devoted to hbase?

How does your HDFS data arrive?  From one or a few machines in the cluster?
 From outside the cluster?

On Thu, Mar 17, 2011 at 12:13 PM, Stuart Smith <st...@yahoo.com> wrote:

> Parts of this may end up on the hbase list, but I thought I'd start here.
> My basic problem is:
>
> My cluster is getting full enough that having one data node go down does
> put a bit of pressure on the system (when balanced, every DN is more than
> half full).
>
> I write (and delete) pretty actively to Hbase & some hdfs direct.
>
> The cluster keeps drifting dangerously out of balance.
>
> I run the balancer daily, but:
>
>   - I've seen reports that you shouldn't rebalance with regionservers
> running, yet, I don't really have a choice. Without HBase, my system is
> pretty much down. If it gets out of balance, it will also come down.
>
>  Anybody here have any idea how badly running the balancer on a heavily
> active system messes things up? (for hdfs/hbase - if anyone knows).
>
>   - Possibly somewhat related: I'm seeing more "failed to move block"
> errors in my balancer logs. It got to the point were I wasn't seeing any
> effective rebalancing occur. I've turned off access to the cluster and
> rebalanced (one node was down to 10% free space, a couple others when up to
> 50 or more). I'm back down to around 20-40% free space on each node (as
> reported by the hdfs web interface).
>
>    How effective is the balancer on a active cluster? Is there any way to
> make it's life easier, so it can stay in balance with daily runs?
>
> I'm not sure why the one node ends up being so heavily favored, either. The
> favoritism even seems to survive taking the node down, and bringing it back
> up. If I can't find the resources to upgrade, I might try that again, but
> I'm less than hopeful about it.
>
> Any ideas? Or do I just need better hardware? Not sure if that's an option,
> though..
>
> Take care,
>  -stu
>
>
>
>