You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Usman Waheed <us...@opera.com> on 2009/06/25 11:33:58 UTC

Rebalancing Hadoop Cluster running 15.3

Hi,

One of our test clusters is running HADOOP 15.3 with replication level 
set to 2. The datanodes are not balanced at all.

Datanode_1: 52%
Datanode_2: 82%
Datanode_3: 30%

15.3 does not have the rebalancer capability, we are planning to upgrade 
but not for now.

If i take out Datanode_1 from the cluster (decommission for sometime) 
will HADOOP balance so that Datanode_2 and Datanode_3 will even out to 56%?
Then i can re-introduce Datanode_1 back into the cluster.

Comments/Suggestions please?

Thanks,
Usman

Re: Rebalancing Hadoop Cluster running 15.3

Posted by Usman Waheed <us...@opera.com>.
Thanks much,
Cheers,
Usman
> You can change the value of hadoop.root.logger in
> conf/log4j.properties to change the log level globally. See also the
> section "Custom Logging levels" in the same file to set levels on a
> per-component basis.
>
> You can also use hadoop daemonlog to set log levels on a temporary
> basis (they are reset on restart). I'm not sure if this was in Hadoop
> 0.15.
>
> Cheers,
> Tom
>
> On Thu, Jun 25, 2009 at 11:12 AM, Usman Waheed<us...@opera.com> wrote:
>   
>> Hi Tom,
>>
>> Thanks for the trick :).
>>
>> I tried by setting the replication to 3 in the hadoop-default.xml but then
>> the namenode-logfile in /var/log/hadoop started getting full with the
>> messages marked in bold:
>>
>> 2009-06-24 14:39:06,338 INFO org.apache.hadoop.dfs.StateChange: STATE*
>> SafeModeInfo.leave: Safe mode is OFF.
>> 2009-06-24 14:39:06,339 INFO org.apache.hadoop.dfs.StateChange: STATE*
>> Network topology has 1 racks and 3 datanodes
>> 2009-06-24 14:39:06,339 INFO org.apache.hadoop.dfs.StateChange: STATE*
>> UnderReplicatedBlocks has 48545 blocks
>> 2009-06-24 14:39:07,655 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
>> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
>> blk_-4602580985572290582 to datanode(s) 10.20.11.44:50010
>> 2009-06-24 14:39:07,655 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
>> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
>> blk_-4602036196619511999 to datanode(s) 10.20.11.44:50010
>> 2009-06-24 14:39:07,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
>> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
>> blk_-4601863051065326105 to datanode(s) 10.20.11.44:50010
>> 2009-06-24 14:39:07,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
>> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
>> blk_-4601770656364938220 to datanode(s) 10.20.11.44:50010
>> 2009-06-24 14:39:10,829 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
>> NameSystem.addStoredBlock: blockMap updated: 10.20.11.44:50010 is added to
>> blk_-4601770656364938220
>> 2009-06-24 14:39:10,832 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
>> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
>> blk_-4601706607039808418 to datanode(s) 10.20.11.44:50010
>> 2009-06-24 14:39:10,833 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
>> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
>> blk_-4601652202073012439 to datanode(s) 10.20.11.44:50010
>> 2009-06-24 14:39:10,834 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
>> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
>> blk_-4601470720696217621 to datanode(s) 10.20.11.44:50010
>> 2009-06-24 14:39:10,834 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
>> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
>> blk_-4601267705629076611 to datanode(s) 10.20.11.44:50010
>> *2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
>> place enough replicas, still in need of 1
>> 2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
>> place enough replicas, still in need of 1
>> 2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
>> place enough replicas, still in need of 1
>> 2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
>> place enough replicas, still in need of 1
>> 2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
>> place enough replicas, still in need of 1
>> 2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
>> place enough replicas, still in need of 1
>> 2009-06-24 14:39:13,901 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
>> place enough replicas, still in need of 1
>> 2009-06-24 14:39:13,901 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
>> place enough replicas, still in need of 1*
>>
>> It is a very small cluster with limited disk space. The disk was getting
>> full because of all these extra messages there were being written to the
>> logfile. Eventually the file system would file up and hadoop hangs.
>> This happened when i set the dfs.replication = 3 in the hadoop-default.xml
>> and restarted the cluster.
>>
>> Is there a way i can turn off these WARN messages which are filling up the
>> file system. I can run the command on the command line like you advised with
>> replication set to 3 and then once done, set it back to 2.
>> Currently the dfs.replication is set to 2 in the hadoop-default.xml.
>>
>> Thanks,
>> Usman
>>
>>     
>>> Hi Usman,
>>>
>>> Before the rebalancer was introduced one trick people used was to
>>> increase the replication on all the files in the system, wait for
>>> re-replication to complete, then decrease the replication to the
>>> original level. You can do this using hadoop fs -setrep.
>>>
>>> Cheers,
>>> Tom
>>>
>>> On Thu, Jun 25, 2009 at 10:33 AM, Usman Waheed<us...@opera.com> wrote:
>>>
>>>       
>>>> Hi,
>>>>
>>>> One of our test clusters is running HADOOP 15.3 with replication level
>>>> set
>>>> to 2. The datanodes are not balanced at all.
>>>>
>>>> Datanode_1: 52%
>>>> Datanode_2: 82%
>>>> Datanode_3: 30%
>>>>
>>>> 15.3 does not have the rebalancer capability, we are planning to upgrade
>>>> but
>>>> not for now.
>>>>
>>>> If i take out Datanode_1 from the cluster (decommission for sometime)
>>>> will
>>>> HADOOP balance so that Datanode_2 and Datanode_3 will even out to 56%?
>>>> Then i can re-introduce Datanode_1 back into the cluster.
>>>>
>>>> Comments/Suggestions please?
>>>>
>>>> Thanks,
>>>> Usman
>>>>
>>>>
>>>>         
>>     


Re: Rebalancing Hadoop Cluster running 15.3

Posted by Tom White <to...@cloudera.com>.
You can change the value of hadoop.root.logger in
conf/log4j.properties to change the log level globally. See also the
section "Custom Logging levels" in the same file to set levels on a
per-component basis.

You can also use hadoop daemonlog to set log levels on a temporary
basis (they are reset on restart). I'm not sure if this was in Hadoop
0.15.

Cheers,
Tom

On Thu, Jun 25, 2009 at 11:12 AM, Usman Waheed<us...@opera.com> wrote:
> Hi Tom,
>
> Thanks for the trick :).
>
> I tried by setting the replication to 3 in the hadoop-default.xml but then
> the namenode-logfile in /var/log/hadoop started getting full with the
> messages marked in bold:
>
> 2009-06-24 14:39:06,338 INFO org.apache.hadoop.dfs.StateChange: STATE*
> SafeModeInfo.leave: Safe mode is OFF.
> 2009-06-24 14:39:06,339 INFO org.apache.hadoop.dfs.StateChange: STATE*
> Network topology has 1 racks and 3 datanodes
> 2009-06-24 14:39:06,339 INFO org.apache.hadoop.dfs.StateChange: STATE*
> UnderReplicatedBlocks has 48545 blocks
> 2009-06-24 14:39:07,655 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
> blk_-4602580985572290582 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:07,655 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
> blk_-4602036196619511999 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:07,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
> blk_-4601863051065326105 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:07,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
> blk_-4601770656364938220 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:10,829 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 10.20.11.44:50010 is added to
> blk_-4601770656364938220
> 2009-06-24 14:39:10,832 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
> blk_-4601706607039808418 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:10,833 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
> blk_-4601652202073012439 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:10,834 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
> blk_-4601470720696217621 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:10,834 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
> blk_-4601267705629076611 to datanode(s) 10.20.11.44:50010
> *2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,901 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,901 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1*
>
> It is a very small cluster with limited disk space. The disk was getting
> full because of all these extra messages there were being written to the
> logfile. Eventually the file system would file up and hadoop hangs.
> This happened when i set the dfs.replication = 3 in the hadoop-default.xml
> and restarted the cluster.
>
> Is there a way i can turn off these WARN messages which are filling up the
> file system. I can run the command on the command line like you advised with
> replication set to 3 and then once done, set it back to 2.
> Currently the dfs.replication is set to 2 in the hadoop-default.xml.
>
> Thanks,
> Usman
>
>> Hi Usman,
>>
>> Before the rebalancer was introduced one trick people used was to
>> increase the replication on all the files in the system, wait for
>> re-replication to complete, then decrease the replication to the
>> original level. You can do this using hadoop fs -setrep.
>>
>> Cheers,
>> Tom
>>
>> On Thu, Jun 25, 2009 at 10:33 AM, Usman Waheed<us...@opera.com> wrote:
>>
>>>
>>> Hi,
>>>
>>> One of our test clusters is running HADOOP 15.3 with replication level
>>> set
>>> to 2. The datanodes are not balanced at all.
>>>
>>> Datanode_1: 52%
>>> Datanode_2: 82%
>>> Datanode_3: 30%
>>>
>>> 15.3 does not have the rebalancer capability, we are planning to upgrade
>>> but
>>> not for now.
>>>
>>> If i take out Datanode_1 from the cluster (decommission for sometime)
>>> will
>>> HADOOP balance so that Datanode_2 and Datanode_3 will even out to 56%?
>>> Then i can re-introduce Datanode_1 back into the cluster.
>>>
>>> Comments/Suggestions please?
>>>
>>> Thanks,
>>> Usman
>>>
>>>
>
>

Re: Rebalancing Hadoop Cluster running 15.3

Posted by Usman Waheed <us...@opera.com>.
Hi Tom,

Thanks for the trick :).

I tried by setting the replication to 3 in the hadoop-default.xml but 
then the namenode-logfile in /var/log/hadoop started getting full with 
the messages marked in bold:

2009-06-24 14:39:06,338 INFO org.apache.hadoop.dfs.StateChange: STATE* 
SafeModeInfo.leave: Safe mode is OFF.
2009-06-24 14:39:06,339 INFO org.apache.hadoop.dfs.StateChange: STATE* 
Network topology has 1 racks and 3 datanodes
2009-06-24 14:39:06,339 INFO org.apache.hadoop.dfs.StateChange: STATE* 
UnderReplicatedBlocks has 48545 blocks
2009-06-24 14:39:07,655 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate 
blk_-4602580985572290582 to datanode(s) 10.20.11.44:50010
2009-06-24 14:39:07,655 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate 
blk_-4602036196619511999 to datanode(s) 10.20.11.44:50010
2009-06-24 14:39:07,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate 
blk_-4601863051065326105 to datanode(s) 10.20.11.44:50010
2009-06-24 14:39:07,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate 
blk_-4601770656364938220 to datanode(s) 10.20.11.44:50010
2009-06-24 14:39:10,829 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 10.20.11.44:50010 is added 
to blk_-4601770656364938220
2009-06-24 14:39:10,832 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate 
blk_-4601706607039808418 to datanode(s) 10.20.11.44:50010
2009-06-24 14:39:10,833 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate 
blk_-4601652202073012439 to datanode(s) 10.20.11.44:50010
2009-06-24 14:39:10,834 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate 
blk_-4601470720696217621 to datanode(s) 10.20.11.44:50010
2009-06-24 14:39:10,834 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate 
blk_-4601267705629076611 to datanode(s) 10.20.11.44:50010
*2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not 
able to place enough replicas, still in need of 1
2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able 
to place enough replicas, still in need of 1
2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able 
to place enough replicas, still in need of 1
2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able 
to place enough replicas, still in need of 1
2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able 
to place enough replicas, still in need of 1
2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able 
to place enough replicas, still in need of 1
2009-06-24 14:39:13,901 WARN org.apache.hadoop.fs.FSNamesystem: Not able 
to place enough replicas, still in need of 1
2009-06-24 14:39:13,901 WARN org.apache.hadoop.fs.FSNamesystem: Not able 
to place enough replicas, still in need of 1*

It is a very small cluster with limited disk space. The disk was getting 
full because of all these extra messages there were being written to the 
logfile. Eventually the file system would file up and hadoop hangs.
This happened when i set the dfs.replication = 3 in the 
hadoop-default.xml and restarted the cluster.

Is there a way i can turn off these WARN messages which are filling up 
the file system. I can run the command on the command line like you 
advised with replication set to 3 and then once done, set it back to 2.
Currently the dfs.replication is set to 2 in the hadoop-default.xml.

Thanks,
Usman

> Hi Usman,
>
> Before the rebalancer was introduced one trick people used was to
> increase the replication on all the files in the system, wait for
> re-replication to complete, then decrease the replication to the
> original level. You can do this using hadoop fs -setrep.
>
> Cheers,
> Tom
>
> On Thu, Jun 25, 2009 at 10:33 AM, Usman Waheed<us...@opera.com> wrote:
>   
>> Hi,
>>
>> One of our test clusters is running HADOOP 15.3 with replication level set
>> to 2. The datanodes are not balanced at all.
>>
>> Datanode_1: 52%
>> Datanode_2: 82%
>> Datanode_3: 30%
>>
>> 15.3 does not have the rebalancer capability, we are planning to upgrade but
>> not for now.
>>
>> If i take out Datanode_1 from the cluster (decommission for sometime) will
>> HADOOP balance so that Datanode_2 and Datanode_3 will even out to 56%?
>> Then i can re-introduce Datanode_1 back into the cluster.
>>
>> Comments/Suggestions please?
>>
>> Thanks,
>> Usman
>>
>>     


Re: Rebalancing Hadoop Cluster running 15.3

Posted by Tom White <to...@cloudera.com>.
Hi Usman,

Before the rebalancer was introduced one trick people used was to
increase the replication on all the files in the system, wait for
re-replication to complete, then decrease the replication to the
original level. You can do this using hadoop fs -setrep.

Cheers,
Tom

On Thu, Jun 25, 2009 at 10:33 AM, Usman Waheed<us...@opera.com> wrote:
> Hi,
>
> One of our test clusters is running HADOOP 15.3 with replication level set
> to 2. The datanodes are not balanced at all.
>
> Datanode_1: 52%
> Datanode_2: 82%
> Datanode_3: 30%
>
> 15.3 does not have the rebalancer capability, we are planning to upgrade but
> not for now.
>
> If i take out Datanode_1 from the cluster (decommission for sometime) will
> HADOOP balance so that Datanode_2 and Datanode_3 will even out to 56%?
> Then i can re-introduce Datanode_1 back into the cluster.
>
> Comments/Suggestions please?
>
> Thanks,
> Usman
>