You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Tapas Sarangi <ta...@gmail.com> on 2013/03/18 22:01:56 UTC

disk used percentage is not symmetric on datanodes (balancer)

Hello,

I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.

We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB. 

We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?

If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem. 

I am happy to provide additional information if needed.

Thanks for any help.

-Tapas


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 18, 2013, at 6:17 PM, Bertrand Dechoux <de...@gmail.com> wrote:

> And by active, it means that it does actually stops by itself?
> Else it might mean that the throttling/limit might be an issue with regard to the data volume or velocity.
> 

This "else" is probably what's happening. I just checked the logs. Its active almost all the time. 


> What threshold is used?

Don't know what's this. How can I find out ?

> 
> About the small and big datanodes, how are they distributed with regards to racks?

We haven't considered rack awareness for our cluster. It is currently considered as one rack. I am going through some docs to figure out how I can implement this after the upgrade.

> About files, how is used the replication factor(s) and block size(s)?

This is 2.

> 
> Surely trivial questions again.
> 

Not really :)

Thanks
-Tapas


> Bertrand
> 
> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Hi,
> 
> Sorry about that, had it written, but thought it was obvious. 
> Yes, balancer is active and running on the namenode.
> 
> -Tapas
> 
> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> 
>> Hi,
>> 
>> It is not explicitly said but did you use the balancer?
>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>> 
>> Regards
>> 
>> Bertrand
>> 
>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Hello,
>> 
>> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
>> 
>> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
>> 
>> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
>> 
>> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
>> 
>> I am happy to provide additional information if needed.
>> 
>> Thanks for any help.
>> 
>> -Tapas
>> 
> 
> 
> 
> 
> -- 
> Bertrand Dechoux


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks for the reply. How can I assign a new value to the transfer speed for the balancer ? Is this the parameter, dfs.balance.bandwidthPerSec ?

Where should this go, in conf/hdfs-site.xml ? or conf/core-site.xml  ?

-Tapas

 
On Mar 19, 2013, at 11:05 PM, Harsh J <ha...@cloudera.com> wrote:

> If your balancer does not exit, then it means its heavily working in
> iterations trying to balance your cluster. The default bandwidth
> allows only for limited transfer speed (10 Mbps) to not affect the
> cluster's RW performance while moving blocks between DNs for
> balancing, so the operation may be slow unless you raise the allowed
> bandwidth.
> 
> On Wed, Mar 20, 2013 at 7:37 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Any more follow ups ?
>> 
>> Thanks
>> -Tapas
>> 
>> On Mar 19, 2013, at 9:55 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>> 
>>> 
>>> On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:
>>> 
>>>> What do you mean that the balancer is always active?
>>> 
>>> meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.
>>> 
>>> 
>>>> It is to be used
>>>> as a tool and it exits once it balances in a specific run (loops until
>>>> it does, but always exits at end). The balancer does balance based on
>>>> usage percentage so that is what you're probably looking for/missing.
>>>> 
>>> 
>>> May be. How does the balancer look for the usage percentage ?
>>> 
>>> -Tapas
>>> 
>>> 
>>>> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>> Hi,
>>>>> 
>>>>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>>>>> 
>>>>> Maybe you need to modify the rackware script to make the rack balance, ie,
>>>>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>>>>> large nodes.
>>>>> P.S.
>>>>> you need to reboot the cluster for rackware script modify.
>>>>> 
>>>>> 
>>>>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>>>>> awareness for the cluster, currently it is considered as just one rack. Can
>>>>> that be the problem ? I don't know…
>>>>> 
>>>>> -Tapas
>>>>> 
>>>>> 
>>>>> 
>>>>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>>>>> 
>>>>> And by active, it means that it does actually stops by itself? Else it might
>>>>> mean that the throttling/limit might be an issue with regard to the data
>>>>> volume or velocity.
>>>>> 
>>>>> What threshold is used?
>>>>> 
>>>>> About the small and big datanodes, how are they distributed with regards to
>>>>> racks?
>>>>> About files, how is used the replication factor(s) and block size(s)?
>>>>> 
>>>>> Surely trivial questions again.
>>>>> 
>>>>> Bertrand
>>>>> 
>>>>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Sorry about that, had it written, but thought it was obvious.
>>>>>> Yes, balancer is active and running on the namenode.
>>>>>> 
>>>>>> -Tapas
>>>>>> 
>>>>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> It is not explicitly said but did you use the balancer?
>>>>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>>>>> 
>>>>>> Regards
>>>>>> 
>>>>>> Bertrand
>>>>>> 
>>>>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>>>>> couple of months, but I would like to understand a couple of things before
>>>>>>> moving towards the upgrade plan.
>>>>>>> 
>>>>>>> We have about 200 datanodes and some of them have larger storage than
>>>>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>>>>> 
>>>>>>> We found that the disk-used percentage is not symmetric through all the
>>>>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>>>>> much lower than that of other nodes with smaller storage space. In larger
>>>>>>> storage nodes the percentage of used disk space varies, but on average about
>>>>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>>>>> this expected ? If so, then we are not using a lot of the disk space
>>>>>>> effectively. Is this solved in a future release ?
>>>>>>> 
>>>>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>>>>> do to find an improvement with the current version or upgrading hadoop
>>>>>>> should solve this problem.
>>>>>>> 
>>>>>>> I am happy to provide additional information if needed.
>>>>>>> 
>>>>>>> Thanks for any help.
>>>>>>> 
>>>>>>> -Tapas
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Bertrand Dechoux
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Harsh J
>>> 
>> 
> 
> 
> 
> -- 
> Harsh J


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks for the reply. How can I assign a new value to the transfer speed for the balancer ? Is this the parameter, dfs.balance.bandwidthPerSec ?

Where should this go, in conf/hdfs-site.xml ? or conf/core-site.xml  ?

-Tapas

 
On Mar 19, 2013, at 11:05 PM, Harsh J <ha...@cloudera.com> wrote:

> If your balancer does not exit, then it means its heavily working in
> iterations trying to balance your cluster. The default bandwidth
> allows only for limited transfer speed (10 Mbps) to not affect the
> cluster's RW performance while moving blocks between DNs for
> balancing, so the operation may be slow unless you raise the allowed
> bandwidth.
> 
> On Wed, Mar 20, 2013 at 7:37 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Any more follow ups ?
>> 
>> Thanks
>> -Tapas
>> 
>> On Mar 19, 2013, at 9:55 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>> 
>>> 
>>> On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:
>>> 
>>>> What do you mean that the balancer is always active?
>>> 
>>> meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.
>>> 
>>> 
>>>> It is to be used
>>>> as a tool and it exits once it balances in a specific run (loops until
>>>> it does, but always exits at end). The balancer does balance based on
>>>> usage percentage so that is what you're probably looking for/missing.
>>>> 
>>> 
>>> May be. How does the balancer look for the usage percentage ?
>>> 
>>> -Tapas
>>> 
>>> 
>>>> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>> Hi,
>>>>> 
>>>>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>>>>> 
>>>>> Maybe you need to modify the rackware script to make the rack balance, ie,
>>>>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>>>>> large nodes.
>>>>> P.S.
>>>>> you need to reboot the cluster for rackware script modify.
>>>>> 
>>>>> 
>>>>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>>>>> awareness for the cluster, currently it is considered as just one rack. Can
>>>>> that be the problem ? I don't know…
>>>>> 
>>>>> -Tapas
>>>>> 
>>>>> 
>>>>> 
>>>>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>>>>> 
>>>>> And by active, it means that it does actually stops by itself? Else it might
>>>>> mean that the throttling/limit might be an issue with regard to the data
>>>>> volume or velocity.
>>>>> 
>>>>> What threshold is used?
>>>>> 
>>>>> About the small and big datanodes, how are they distributed with regards to
>>>>> racks?
>>>>> About files, how is used the replication factor(s) and block size(s)?
>>>>> 
>>>>> Surely trivial questions again.
>>>>> 
>>>>> Bertrand
>>>>> 
>>>>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Sorry about that, had it written, but thought it was obvious.
>>>>>> Yes, balancer is active and running on the namenode.
>>>>>> 
>>>>>> -Tapas
>>>>>> 
>>>>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> It is not explicitly said but did you use the balancer?
>>>>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>>>>> 
>>>>>> Regards
>>>>>> 
>>>>>> Bertrand
>>>>>> 
>>>>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>>>>> couple of months, but I would like to understand a couple of things before
>>>>>>> moving towards the upgrade plan.
>>>>>>> 
>>>>>>> We have about 200 datanodes and some of them have larger storage than
>>>>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>>>>> 
>>>>>>> We found that the disk-used percentage is not symmetric through all the
>>>>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>>>>> much lower than that of other nodes with smaller storage space. In larger
>>>>>>> storage nodes the percentage of used disk space varies, but on average about
>>>>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>>>>> this expected ? If so, then we are not using a lot of the disk space
>>>>>>> effectively. Is this solved in a future release ?
>>>>>>> 
>>>>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>>>>> do to find an improvement with the current version or upgrading hadoop
>>>>>>> should solve this problem.
>>>>>>> 
>>>>>>> I am happy to provide additional information if needed.
>>>>>>> 
>>>>>>> Thanks for any help.
>>>>>>> 
>>>>>>> -Tapas
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Bertrand Dechoux
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Harsh J
>>> 
>> 
> 
> 
> 
> -- 
> Harsh J


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks for the reply. How can I assign a new value to the transfer speed for the balancer ? Is this the parameter, dfs.balance.bandwidthPerSec ?

Where should this go, in conf/hdfs-site.xml ? or conf/core-site.xml  ?

-Tapas

 
On Mar 19, 2013, at 11:05 PM, Harsh J <ha...@cloudera.com> wrote:

> If your balancer does not exit, then it means its heavily working in
> iterations trying to balance your cluster. The default bandwidth
> allows only for limited transfer speed (10 Mbps) to not affect the
> cluster's RW performance while moving blocks between DNs for
> balancing, so the operation may be slow unless you raise the allowed
> bandwidth.
> 
> On Wed, Mar 20, 2013 at 7:37 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Any more follow ups ?
>> 
>> Thanks
>> -Tapas
>> 
>> On Mar 19, 2013, at 9:55 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>> 
>>> 
>>> On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:
>>> 
>>>> What do you mean that the balancer is always active?
>>> 
>>> meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.
>>> 
>>> 
>>>> It is to be used
>>>> as a tool and it exits once it balances in a specific run (loops until
>>>> it does, but always exits at end). The balancer does balance based on
>>>> usage percentage so that is what you're probably looking for/missing.
>>>> 
>>> 
>>> May be. How does the balancer look for the usage percentage ?
>>> 
>>> -Tapas
>>> 
>>> 
>>>> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>> Hi,
>>>>> 
>>>>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>>>>> 
>>>>> Maybe you need to modify the rackware script to make the rack balance, ie,
>>>>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>>>>> large nodes.
>>>>> P.S.
>>>>> you need to reboot the cluster for rackware script modify.
>>>>> 
>>>>> 
>>>>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>>>>> awareness for the cluster, currently it is considered as just one rack. Can
>>>>> that be the problem ? I don't know…
>>>>> 
>>>>> -Tapas
>>>>> 
>>>>> 
>>>>> 
>>>>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>>>>> 
>>>>> And by active, it means that it does actually stops by itself? Else it might
>>>>> mean that the throttling/limit might be an issue with regard to the data
>>>>> volume or velocity.
>>>>> 
>>>>> What threshold is used?
>>>>> 
>>>>> About the small and big datanodes, how are they distributed with regards to
>>>>> racks?
>>>>> About files, how is used the replication factor(s) and block size(s)?
>>>>> 
>>>>> Surely trivial questions again.
>>>>> 
>>>>> Bertrand
>>>>> 
>>>>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Sorry about that, had it written, but thought it was obvious.
>>>>>> Yes, balancer is active and running on the namenode.
>>>>>> 
>>>>>> -Tapas
>>>>>> 
>>>>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> It is not explicitly said but did you use the balancer?
>>>>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>>>>> 
>>>>>> Regards
>>>>>> 
>>>>>> Bertrand
>>>>>> 
>>>>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>>>>> couple of months, but I would like to understand a couple of things before
>>>>>>> moving towards the upgrade plan.
>>>>>>> 
>>>>>>> We have about 200 datanodes and some of them have larger storage than
>>>>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>>>>> 
>>>>>>> We found that the disk-used percentage is not symmetric through all the
>>>>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>>>>> much lower than that of other nodes with smaller storage space. In larger
>>>>>>> storage nodes the percentage of used disk space varies, but on average about
>>>>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>>>>> this expected ? If so, then we are not using a lot of the disk space
>>>>>>> effectively. Is this solved in a future release ?
>>>>>>> 
>>>>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>>>>> do to find an improvement with the current version or upgrading hadoop
>>>>>>> should solve this problem.
>>>>>>> 
>>>>>>> I am happy to provide additional information if needed.
>>>>>>> 
>>>>>>> Thanks for any help.
>>>>>>> 
>>>>>>> -Tapas
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Bertrand Dechoux
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Harsh J
>>> 
>> 
> 
> 
> 
> -- 
> Harsh J


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks for the reply. How can I assign a new value to the transfer speed for the balancer ? Is this the parameter, dfs.balance.bandwidthPerSec ?

Where should this go, in conf/hdfs-site.xml ? or conf/core-site.xml  ?

-Tapas

 
On Mar 19, 2013, at 11:05 PM, Harsh J <ha...@cloudera.com> wrote:

> If your balancer does not exit, then it means its heavily working in
> iterations trying to balance your cluster. The default bandwidth
> allows only for limited transfer speed (10 Mbps) to not affect the
> cluster's RW performance while moving blocks between DNs for
> balancing, so the operation may be slow unless you raise the allowed
> bandwidth.
> 
> On Wed, Mar 20, 2013 at 7:37 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Any more follow ups ?
>> 
>> Thanks
>> -Tapas
>> 
>> On Mar 19, 2013, at 9:55 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>> 
>>> 
>>> On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:
>>> 
>>>> What do you mean that the balancer is always active?
>>> 
>>> meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.
>>> 
>>> 
>>>> It is to be used
>>>> as a tool and it exits once it balances in a specific run (loops until
>>>> it does, but always exits at end). The balancer does balance based on
>>>> usage percentage so that is what you're probably looking for/missing.
>>>> 
>>> 
>>> May be. How does the balancer look for the usage percentage ?
>>> 
>>> -Tapas
>>> 
>>> 
>>>> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>> Hi,
>>>>> 
>>>>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>>>>> 
>>>>> Maybe you need to modify the rackware script to make the rack balance, ie,
>>>>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>>>>> large nodes.
>>>>> P.S.
>>>>> you need to reboot the cluster for rackware script modify.
>>>>> 
>>>>> 
>>>>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>>>>> awareness for the cluster, currently it is considered as just one rack. Can
>>>>> that be the problem ? I don't know…
>>>>> 
>>>>> -Tapas
>>>>> 
>>>>> 
>>>>> 
>>>>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>>>>> 
>>>>> And by active, it means that it does actually stops by itself? Else it might
>>>>> mean that the throttling/limit might be an issue with regard to the data
>>>>> volume or velocity.
>>>>> 
>>>>> What threshold is used?
>>>>> 
>>>>> About the small and big datanodes, how are they distributed with regards to
>>>>> racks?
>>>>> About files, how is used the replication factor(s) and block size(s)?
>>>>> 
>>>>> Surely trivial questions again.
>>>>> 
>>>>> Bertrand
>>>>> 
>>>>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Sorry about that, had it written, but thought it was obvious.
>>>>>> Yes, balancer is active and running on the namenode.
>>>>>> 
>>>>>> -Tapas
>>>>>> 
>>>>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> It is not explicitly said but did you use the balancer?
>>>>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>>>>> 
>>>>>> Regards
>>>>>> 
>>>>>> Bertrand
>>>>>> 
>>>>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>>>>> couple of months, but I would like to understand a couple of things before
>>>>>>> moving towards the upgrade plan.
>>>>>>> 
>>>>>>> We have about 200 datanodes and some of them have larger storage than
>>>>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>>>>> 
>>>>>>> We found that the disk-used percentage is not symmetric through all the
>>>>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>>>>> much lower than that of other nodes with smaller storage space. In larger
>>>>>>> storage nodes the percentage of used disk space varies, but on average about
>>>>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>>>>> this expected ? If so, then we are not using a lot of the disk space
>>>>>>> effectively. Is this solved in a future release ?
>>>>>>> 
>>>>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>>>>> do to find an improvement with the current version or upgrading hadoop
>>>>>>> should solve this problem.
>>>>>>> 
>>>>>>> I am happy to provide additional information if needed.
>>>>>>> 
>>>>>>> Thanks for any help.
>>>>>>> 
>>>>>>> -Tapas
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Bertrand Dechoux
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Harsh J
>>> 
>> 
> 
> 
> 
> -- 
> Harsh J


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Harsh J <ha...@cloudera.com>.
If your balancer does not exit, then it means its heavily working in
iterations trying to balance your cluster. The default bandwidth
allows only for limited transfer speed (10 Mbps) to not affect the
cluster's RW performance while moving blocks between DNs for
balancing, so the operation may be slow unless you raise the allowed
bandwidth.

On Wed, Mar 20, 2013 at 7:37 AM, Tapas Sarangi <ta...@gmail.com> wrote:
> Any more follow ups ?
>
> Thanks
> -Tapas
>
> On Mar 19, 2013, at 9:55 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>
>>
>> On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> What do you mean that the balancer is always active?
>>
>> meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.
>>
>>
>>> It is to be used
>>> as a tool and it exits once it balances in a specific run (loops until
>>> it does, but always exits at end). The balancer does balance based on
>>> usage percentage so that is what you're probably looking for/missing.
>>>
>>
>> May be. How does the balancer look for the usage percentage ?
>>
>> -Tapas
>>
>>
>>> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>>>>
>>>> Maybe you need to modify the rackware script to make the rack balance, ie,
>>>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>>>> large nodes.
>>>> P.S.
>>>> you need to reboot the cluster for rackware script modify.
>>>>
>>>>
>>>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>>>> awareness for the cluster, currently it is considered as just one rack. Can
>>>> that be the problem ? I don't know…
>>>>
>>>> -Tapas
>>>>
>>>>
>>>>
>>>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>>>>
>>>> And by active, it means that it does actually stops by itself? Else it might
>>>> mean that the throttling/limit might be an issue with regard to the data
>>>> volume or velocity.
>>>>
>>>> What threshold is used?
>>>>
>>>> About the small and big datanodes, how are they distributed with regards to
>>>> racks?
>>>> About files, how is used the replication factor(s) and block size(s)?
>>>>
>>>> Surely trivial questions again.
>>>>
>>>> Bertrand
>>>>
>>>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Sorry about that, had it written, but thought it was obvious.
>>>>> Yes, balancer is active and running on the namenode.
>>>>>
>>>>> -Tapas
>>>>>
>>>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> It is not explicitly said but did you use the balancer?
>>>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>>>>
>>>>> Regards
>>>>>
>>>>> Bertrand
>>>>>
>>>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>>>> couple of months, but I would like to understand a couple of things before
>>>>>> moving towards the upgrade plan.
>>>>>>
>>>>>> We have about 200 datanodes and some of them have larger storage than
>>>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>>>>
>>>>>> We found that the disk-used percentage is not symmetric through all the
>>>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>>>> much lower than that of other nodes with smaller storage space. In larger
>>>>>> storage nodes the percentage of used disk space varies, but on average about
>>>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>>>> this expected ? If so, then we are not using a lot of the disk space
>>>>>> effectively. Is this solved in a future release ?
>>>>>>
>>>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>>>> do to find an improvement with the current version or upgrading hadoop
>>>>>> should solve this problem.
>>>>>>
>>>>>> I am happy to provide additional information if needed.
>>>>>>
>>>>>> Thanks for any help.
>>>>>>
>>>>>> -Tapas
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Bertrand Dechoux
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>



-- 
Harsh J

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Harsh J <ha...@cloudera.com>.
If your balancer does not exit, then it means its heavily working in
iterations trying to balance your cluster. The default bandwidth
allows only for limited transfer speed (10 Mbps) to not affect the
cluster's RW performance while moving blocks between DNs for
balancing, so the operation may be slow unless you raise the allowed
bandwidth.

On Wed, Mar 20, 2013 at 7:37 AM, Tapas Sarangi <ta...@gmail.com> wrote:
> Any more follow ups ?
>
> Thanks
> -Tapas
>
> On Mar 19, 2013, at 9:55 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>
>>
>> On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> What do you mean that the balancer is always active?
>>
>> meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.
>>
>>
>>> It is to be used
>>> as a tool and it exits once it balances in a specific run (loops until
>>> it does, but always exits at end). The balancer does balance based on
>>> usage percentage so that is what you're probably looking for/missing.
>>>
>>
>> May be. How does the balancer look for the usage percentage ?
>>
>> -Tapas
>>
>>
>>> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>>>>
>>>> Maybe you need to modify the rackware script to make the rack balance, ie,
>>>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>>>> large nodes.
>>>> P.S.
>>>> you need to reboot the cluster for rackware script modify.
>>>>
>>>>
>>>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>>>> awareness for the cluster, currently it is considered as just one rack. Can
>>>> that be the problem ? I don't know…
>>>>
>>>> -Tapas
>>>>
>>>>
>>>>
>>>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>>>>
>>>> And by active, it means that it does actually stops by itself? Else it might
>>>> mean that the throttling/limit might be an issue with regard to the data
>>>> volume or velocity.
>>>>
>>>> What threshold is used?
>>>>
>>>> About the small and big datanodes, how are they distributed with regards to
>>>> racks?
>>>> About files, how is used the replication factor(s) and block size(s)?
>>>>
>>>> Surely trivial questions again.
>>>>
>>>> Bertrand
>>>>
>>>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Sorry about that, had it written, but thought it was obvious.
>>>>> Yes, balancer is active and running on the namenode.
>>>>>
>>>>> -Tapas
>>>>>
>>>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> It is not explicitly said but did you use the balancer?
>>>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>>>>
>>>>> Regards
>>>>>
>>>>> Bertrand
>>>>>
>>>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>>>> couple of months, but I would like to understand a couple of things before
>>>>>> moving towards the upgrade plan.
>>>>>>
>>>>>> We have about 200 datanodes and some of them have larger storage than
>>>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>>>>
>>>>>> We found that the disk-used percentage is not symmetric through all the
>>>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>>>> much lower than that of other nodes with smaller storage space. In larger
>>>>>> storage nodes the percentage of used disk space varies, but on average about
>>>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>>>> this expected ? If so, then we are not using a lot of the disk space
>>>>>> effectively. Is this solved in a future release ?
>>>>>>
>>>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>>>> do to find an improvement with the current version or upgrading hadoop
>>>>>> should solve this problem.
>>>>>>
>>>>>> I am happy to provide additional information if needed.
>>>>>>
>>>>>> Thanks for any help.
>>>>>>
>>>>>> -Tapas
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Bertrand Dechoux
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>



-- 
Harsh J

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Harsh J <ha...@cloudera.com>.
If your balancer does not exit, then it means its heavily working in
iterations trying to balance your cluster. The default bandwidth
allows only for limited transfer speed (10 Mbps) to not affect the
cluster's RW performance while moving blocks between DNs for
balancing, so the operation may be slow unless you raise the allowed
bandwidth.

On Wed, Mar 20, 2013 at 7:37 AM, Tapas Sarangi <ta...@gmail.com> wrote:
> Any more follow ups ?
>
> Thanks
> -Tapas
>
> On Mar 19, 2013, at 9:55 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>
>>
>> On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> What do you mean that the balancer is always active?
>>
>> meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.
>>
>>
>>> It is to be used
>>> as a tool and it exits once it balances in a specific run (loops until
>>> it does, but always exits at end). The balancer does balance based on
>>> usage percentage so that is what you're probably looking for/missing.
>>>
>>
>> May be. How does the balancer look for the usage percentage ?
>>
>> -Tapas
>>
>>
>>> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>>>>
>>>> Maybe you need to modify the rackware script to make the rack balance, ie,
>>>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>>>> large nodes.
>>>> P.S.
>>>> you need to reboot the cluster for rackware script modify.
>>>>
>>>>
>>>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>>>> awareness for the cluster, currently it is considered as just one rack. Can
>>>> that be the problem ? I don't know…
>>>>
>>>> -Tapas
>>>>
>>>>
>>>>
>>>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>>>>
>>>> And by active, it means that it does actually stops by itself? Else it might
>>>> mean that the throttling/limit might be an issue with regard to the data
>>>> volume or velocity.
>>>>
>>>> What threshold is used?
>>>>
>>>> About the small and big datanodes, how are they distributed with regards to
>>>> racks?
>>>> About files, how is used the replication factor(s) and block size(s)?
>>>>
>>>> Surely trivial questions again.
>>>>
>>>> Bertrand
>>>>
>>>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Sorry about that, had it written, but thought it was obvious.
>>>>> Yes, balancer is active and running on the namenode.
>>>>>
>>>>> -Tapas
>>>>>
>>>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> It is not explicitly said but did you use the balancer?
>>>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>>>>
>>>>> Regards
>>>>>
>>>>> Bertrand
>>>>>
>>>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>>>> couple of months, but I would like to understand a couple of things before
>>>>>> moving towards the upgrade plan.
>>>>>>
>>>>>> We have about 200 datanodes and some of them have larger storage than
>>>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>>>>
>>>>>> We found that the disk-used percentage is not symmetric through all the
>>>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>>>> much lower than that of other nodes with smaller storage space. In larger
>>>>>> storage nodes the percentage of used disk space varies, but on average about
>>>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>>>> this expected ? If so, then we are not using a lot of the disk space
>>>>>> effectively. Is this solved in a future release ?
>>>>>>
>>>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>>>> do to find an improvement with the current version or upgrading hadoop
>>>>>> should solve this problem.
>>>>>>
>>>>>> I am happy to provide additional information if needed.
>>>>>>
>>>>>> Thanks for any help.
>>>>>>
>>>>>> -Tapas
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Bertrand Dechoux
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>



-- 
Harsh J

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Harsh J <ha...@cloudera.com>.
If your balancer does not exit, then it means its heavily working in
iterations trying to balance your cluster. The default bandwidth
allows only for limited transfer speed (10 Mbps) to not affect the
cluster's RW performance while moving blocks between DNs for
balancing, so the operation may be slow unless you raise the allowed
bandwidth.

On Wed, Mar 20, 2013 at 7:37 AM, Tapas Sarangi <ta...@gmail.com> wrote:
> Any more follow ups ?
>
> Thanks
> -Tapas
>
> On Mar 19, 2013, at 9:55 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>
>>
>> On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> What do you mean that the balancer is always active?
>>
>> meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.
>>
>>
>>> It is to be used
>>> as a tool and it exits once it balances in a specific run (loops until
>>> it does, but always exits at end). The balancer does balance based on
>>> usage percentage so that is what you're probably looking for/missing.
>>>
>>
>> May be. How does the balancer look for the usage percentage ?
>>
>> -Tapas
>>
>>
>>> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>>>>
>>>> Maybe you need to modify the rackware script to make the rack balance, ie,
>>>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>>>> large nodes.
>>>> P.S.
>>>> you need to reboot the cluster for rackware script modify.
>>>>
>>>>
>>>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>>>> awareness for the cluster, currently it is considered as just one rack. Can
>>>> that be the problem ? I don't know…
>>>>
>>>> -Tapas
>>>>
>>>>
>>>>
>>>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>>>>
>>>> And by active, it means that it does actually stops by itself? Else it might
>>>> mean that the throttling/limit might be an issue with regard to the data
>>>> volume or velocity.
>>>>
>>>> What threshold is used?
>>>>
>>>> About the small and big datanodes, how are they distributed with regards to
>>>> racks?
>>>> About files, how is used the replication factor(s) and block size(s)?
>>>>
>>>> Surely trivial questions again.
>>>>
>>>> Bertrand
>>>>
>>>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Sorry about that, had it written, but thought it was obvious.
>>>>> Yes, balancer is active and running on the namenode.
>>>>>
>>>>> -Tapas
>>>>>
>>>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> It is not explicitly said but did you use the balancer?
>>>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>>>>
>>>>> Regards
>>>>>
>>>>> Bertrand
>>>>>
>>>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>>>> couple of months, but I would like to understand a couple of things before
>>>>>> moving towards the upgrade plan.
>>>>>>
>>>>>> We have about 200 datanodes and some of them have larger storage than
>>>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>>>>
>>>>>> We found that the disk-used percentage is not symmetric through all the
>>>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>>>> much lower than that of other nodes with smaller storage space. In larger
>>>>>> storage nodes the percentage of used disk space varies, but on average about
>>>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>>>> this expected ? If so, then we are not using a lot of the disk space
>>>>>> effectively. Is this solved in a future release ?
>>>>>>
>>>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>>>> do to find an improvement with the current version or upgrading hadoop
>>>>>> should solve this problem.
>>>>>>
>>>>>> I am happy to provide additional information if needed.
>>>>>>
>>>>>> Thanks for any help.
>>>>>>
>>>>>> -Tapas
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Bertrand Dechoux
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>



-- 
Harsh J

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Any more follow ups ? 

Thanks
-Tapas

On Mar 19, 2013, at 9:55 AM, Tapas Sarangi <ta...@gmail.com> wrote:

> 
> On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:
> 
>> What do you mean that the balancer is always active?
> 
> meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.
> 
> 
>> It is to be used
>> as a tool and it exits once it balances in a specific run (loops until
>> it does, but always exits at end). The balancer does balance based on
>> usage percentage so that is what you're probably looking for/missing.
>> 
> 
> May be. How does the balancer look for the usage percentage ?
> 
> -Tapas
> 
> 
>> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Hi,
>>> 
>>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>>> 
>>> Maybe you need to modify the rackware script to make the rack balance, ie,
>>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>>> large nodes.
>>> P.S.
>>> you need to reboot the cluster for rackware script modify.
>>> 
>>> 
>>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>>> awareness for the cluster, currently it is considered as just one rack. Can
>>> that be the problem ? I don't know…
>>> 
>>> -Tapas
>>> 
>>> 
>>> 
>>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>>> 
>>> And by active, it means that it does actually stops by itself? Else it might
>>> mean that the throttling/limit might be an issue with regard to the data
>>> volume or velocity.
>>> 
>>> What threshold is used?
>>> 
>>> About the small and big datanodes, how are they distributed with regards to
>>> racks?
>>> About files, how is used the replication factor(s) and block size(s)?
>>> 
>>> Surely trivial questions again.
>>> 
>>> Bertrand
>>> 
>>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Sorry about that, had it written, but thought it was obvious.
>>>> Yes, balancer is active and running on the namenode.
>>>> 
>>>> -Tapas
>>>> 
>>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> It is not explicitly said but did you use the balancer?
>>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>>> 
>>>> Regards
>>>> 
>>>> Bertrand
>>>> 
>>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>>> couple of months, but I would like to understand a couple of things before
>>>>> moving towards the upgrade plan.
>>>>> 
>>>>> We have about 200 datanodes and some of them have larger storage than
>>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>>> 
>>>>> We found that the disk-used percentage is not symmetric through all the
>>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>>> much lower than that of other nodes with smaller storage space. In larger
>>>>> storage nodes the percentage of used disk space varies, but on average about
>>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>>> this expected ? If so, then we are not using a lot of the disk space
>>>>> effectively. Is this solved in a future release ?
>>>>> 
>>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>>> do to find an improvement with the current version or upgrading hadoop
>>>>> should solve this problem.
>>>>> 
>>>>> I am happy to provide additional information if needed.
>>>>> 
>>>>> Thanks for any help.
>>>>> 
>>>>> -Tapas
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Bertrand Dechoux
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Harsh J
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Any more follow ups ? 

Thanks
-Tapas

On Mar 19, 2013, at 9:55 AM, Tapas Sarangi <ta...@gmail.com> wrote:

> 
> On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:
> 
>> What do you mean that the balancer is always active?
> 
> meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.
> 
> 
>> It is to be used
>> as a tool and it exits once it balances in a specific run (loops until
>> it does, but always exits at end). The balancer does balance based on
>> usage percentage so that is what you're probably looking for/missing.
>> 
> 
> May be. How does the balancer look for the usage percentage ?
> 
> -Tapas
> 
> 
>> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Hi,
>>> 
>>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>>> 
>>> Maybe you need to modify the rackware script to make the rack balance, ie,
>>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>>> large nodes.
>>> P.S.
>>> you need to reboot the cluster for rackware script modify.
>>> 
>>> 
>>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>>> awareness for the cluster, currently it is considered as just one rack. Can
>>> that be the problem ? I don't know…
>>> 
>>> -Tapas
>>> 
>>> 
>>> 
>>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>>> 
>>> And by active, it means that it does actually stops by itself? Else it might
>>> mean that the throttling/limit might be an issue with regard to the data
>>> volume or velocity.
>>> 
>>> What threshold is used?
>>> 
>>> About the small and big datanodes, how are they distributed with regards to
>>> racks?
>>> About files, how is used the replication factor(s) and block size(s)?
>>> 
>>> Surely trivial questions again.
>>> 
>>> Bertrand
>>> 
>>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Sorry about that, had it written, but thought it was obvious.
>>>> Yes, balancer is active and running on the namenode.
>>>> 
>>>> -Tapas
>>>> 
>>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> It is not explicitly said but did you use the balancer?
>>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>>> 
>>>> Regards
>>>> 
>>>> Bertrand
>>>> 
>>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>>> couple of months, but I would like to understand a couple of things before
>>>>> moving towards the upgrade plan.
>>>>> 
>>>>> We have about 200 datanodes and some of them have larger storage than
>>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>>> 
>>>>> We found that the disk-used percentage is not symmetric through all the
>>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>>> much lower than that of other nodes with smaller storage space. In larger
>>>>> storage nodes the percentage of used disk space varies, but on average about
>>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>>> this expected ? If so, then we are not using a lot of the disk space
>>>>> effectively. Is this solved in a future release ?
>>>>> 
>>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>>> do to find an improvement with the current version or upgrading hadoop
>>>>> should solve this problem.
>>>>> 
>>>>> I am happy to provide additional information if needed.
>>>>> 
>>>>> Thanks for any help.
>>>>> 
>>>>> -Tapas
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Bertrand Dechoux
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Harsh J
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Any more follow ups ? 

Thanks
-Tapas

On Mar 19, 2013, at 9:55 AM, Tapas Sarangi <ta...@gmail.com> wrote:

> 
> On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:
> 
>> What do you mean that the balancer is always active?
> 
> meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.
> 
> 
>> It is to be used
>> as a tool and it exits once it balances in a specific run (loops until
>> it does, but always exits at end). The balancer does balance based on
>> usage percentage so that is what you're probably looking for/missing.
>> 
> 
> May be. How does the balancer look for the usage percentage ?
> 
> -Tapas
> 
> 
>> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Hi,
>>> 
>>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>>> 
>>> Maybe you need to modify the rackware script to make the rack balance, ie,
>>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>>> large nodes.
>>> P.S.
>>> you need to reboot the cluster for rackware script modify.
>>> 
>>> 
>>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>>> awareness for the cluster, currently it is considered as just one rack. Can
>>> that be the problem ? I don't know…
>>> 
>>> -Tapas
>>> 
>>> 
>>> 
>>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>>> 
>>> And by active, it means that it does actually stops by itself? Else it might
>>> mean that the throttling/limit might be an issue with regard to the data
>>> volume or velocity.
>>> 
>>> What threshold is used?
>>> 
>>> About the small and big datanodes, how are they distributed with regards to
>>> racks?
>>> About files, how is used the replication factor(s) and block size(s)?
>>> 
>>> Surely trivial questions again.
>>> 
>>> Bertrand
>>> 
>>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Sorry about that, had it written, but thought it was obvious.
>>>> Yes, balancer is active and running on the namenode.
>>>> 
>>>> -Tapas
>>>> 
>>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> It is not explicitly said but did you use the balancer?
>>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>>> 
>>>> Regards
>>>> 
>>>> Bertrand
>>>> 
>>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>>> couple of months, but I would like to understand a couple of things before
>>>>> moving towards the upgrade plan.
>>>>> 
>>>>> We have about 200 datanodes and some of them have larger storage than
>>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>>> 
>>>>> We found that the disk-used percentage is not symmetric through all the
>>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>>> much lower than that of other nodes with smaller storage space. In larger
>>>>> storage nodes the percentage of used disk space varies, but on average about
>>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>>> this expected ? If so, then we are not using a lot of the disk space
>>>>> effectively. Is this solved in a future release ?
>>>>> 
>>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>>> do to find an improvement with the current version or upgrading hadoop
>>>>> should solve this problem.
>>>>> 
>>>>> I am happy to provide additional information if needed.
>>>>> 
>>>>> Thanks for any help.
>>>>> 
>>>>> -Tapas
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Bertrand Dechoux
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Harsh J
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Any more follow ups ? 

Thanks
-Tapas

On Mar 19, 2013, at 9:55 AM, Tapas Sarangi <ta...@gmail.com> wrote:

> 
> On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:
> 
>> What do you mean that the balancer is always active?
> 
> meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.
> 
> 
>> It is to be used
>> as a tool and it exits once it balances in a specific run (loops until
>> it does, but always exits at end). The balancer does balance based on
>> usage percentage so that is what you're probably looking for/missing.
>> 
> 
> May be. How does the balancer look for the usage percentage ?
> 
> -Tapas
> 
> 
>> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Hi,
>>> 
>>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>>> 
>>> Maybe you need to modify the rackware script to make the rack balance, ie,
>>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>>> large nodes.
>>> P.S.
>>> you need to reboot the cluster for rackware script modify.
>>> 
>>> 
>>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>>> awareness for the cluster, currently it is considered as just one rack. Can
>>> that be the problem ? I don't know…
>>> 
>>> -Tapas
>>> 
>>> 
>>> 
>>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>>> 
>>> And by active, it means that it does actually stops by itself? Else it might
>>> mean that the throttling/limit might be an issue with regard to the data
>>> volume or velocity.
>>> 
>>> What threshold is used?
>>> 
>>> About the small and big datanodes, how are they distributed with regards to
>>> racks?
>>> About files, how is used the replication factor(s) and block size(s)?
>>> 
>>> Surely trivial questions again.
>>> 
>>> Bertrand
>>> 
>>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Sorry about that, had it written, but thought it was obvious.
>>>> Yes, balancer is active and running on the namenode.
>>>> 
>>>> -Tapas
>>>> 
>>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> It is not explicitly said but did you use the balancer?
>>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>>> 
>>>> Regards
>>>> 
>>>> Bertrand
>>>> 
>>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>>> wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>>> couple of months, but I would like to understand a couple of things before
>>>>> moving towards the upgrade plan.
>>>>> 
>>>>> We have about 200 datanodes and some of them have larger storage than
>>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>>> 
>>>>> We found that the disk-used percentage is not symmetric through all the
>>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>>> much lower than that of other nodes with smaller storage space. In larger
>>>>> storage nodes the percentage of used disk space varies, but on average about
>>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>>> this expected ? If so, then we are not using a lot of the disk space
>>>>> effectively. Is this solved in a future release ?
>>>>> 
>>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>>> do to find an improvement with the current version or upgrading hadoop
>>>>> should solve this problem.
>>>>> 
>>>>> I am happy to provide additional information if needed.
>>>>> 
>>>>> Thanks for any help.
>>>>> 
>>>>> -Tapas
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Bertrand Dechoux
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Harsh J
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:

> What do you mean that the balancer is always active?

meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.


> It is to be used
> as a tool and it exits once it balances in a specific run (loops until
> it does, but always exits at end). The balancer does balance based on
> usage percentage so that is what you're probably looking for/missing.
> 

May be. How does the balancer look for the usage percentage ?

-Tapas


> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Hi,
>> 
>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>> 
>> Maybe you need to modify the rackware script to make the rack balance, ie,
>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>> large nodes.
>> P.S.
>> you need to reboot the cluster for rackware script modify.
>> 
>> 
>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>> awareness for the cluster, currently it is considered as just one rack. Can
>> that be the problem ? I don't know…
>> 
>> -Tapas
>> 
>> 
>> 
>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>> 
>> And by active, it means that it does actually stops by itself? Else it might
>> mean that the throttling/limit might be an issue with regard to the data
>> volume or velocity.
>> 
>> What threshold is used?
>> 
>> About the small and big datanodes, how are they distributed with regards to
>> racks?
>> About files, how is used the replication factor(s) and block size(s)?
>> 
>> Surely trivial questions again.
>> 
>> Bertrand
>> 
>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> Sorry about that, had it written, but thought it was obvious.
>>> Yes, balancer is active and running on the namenode.
>>> 
>>> -Tapas
>>> 
>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> It is not explicitly said but did you use the balancer?
>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>> 
>>> Regards
>>> 
>>> Bertrand
>>> 
>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>> couple of months, but I would like to understand a couple of things before
>>>> moving towards the upgrade plan.
>>>> 
>>>> We have about 200 datanodes and some of them have larger storage than
>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>> 
>>>> We found that the disk-used percentage is not symmetric through all the
>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>> much lower than that of other nodes with smaller storage space. In larger
>>>> storage nodes the percentage of used disk space varies, but on average about
>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>> this expected ? If so, then we are not using a lot of the disk space
>>>> effectively. Is this solved in a future release ?
>>>> 
>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>> do to find an improvement with the current version or upgrading hadoop
>>>> should solve this problem.
>>>> 
>>>> I am happy to provide additional information if needed.
>>>> 
>>>> Thanks for any help.
>>>> 
>>>> -Tapas
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Bertrand Dechoux
>> 
>> 
>> 
> 
> 
> 
> -- 
> Harsh J


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:

> What do you mean that the balancer is always active?

meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.


> It is to be used
> as a tool and it exits once it balances in a specific run (loops until
> it does, but always exits at end). The balancer does balance based on
> usage percentage so that is what you're probably looking for/missing.
> 

May be. How does the balancer look for the usage percentage ?

-Tapas


> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Hi,
>> 
>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>> 
>> Maybe you need to modify the rackware script to make the rack balance, ie,
>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>> large nodes.
>> P.S.
>> you need to reboot the cluster for rackware script modify.
>> 
>> 
>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>> awareness for the cluster, currently it is considered as just one rack. Can
>> that be the problem ? I don't know…
>> 
>> -Tapas
>> 
>> 
>> 
>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>> 
>> And by active, it means that it does actually stops by itself? Else it might
>> mean that the throttling/limit might be an issue with regard to the data
>> volume or velocity.
>> 
>> What threshold is used?
>> 
>> About the small and big datanodes, how are they distributed with regards to
>> racks?
>> About files, how is used the replication factor(s) and block size(s)?
>> 
>> Surely trivial questions again.
>> 
>> Bertrand
>> 
>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> Sorry about that, had it written, but thought it was obvious.
>>> Yes, balancer is active and running on the namenode.
>>> 
>>> -Tapas
>>> 
>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> It is not explicitly said but did you use the balancer?
>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>> 
>>> Regards
>>> 
>>> Bertrand
>>> 
>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>> couple of months, but I would like to understand a couple of things before
>>>> moving towards the upgrade plan.
>>>> 
>>>> We have about 200 datanodes and some of them have larger storage than
>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>> 
>>>> We found that the disk-used percentage is not symmetric through all the
>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>> much lower than that of other nodes with smaller storage space. In larger
>>>> storage nodes the percentage of used disk space varies, but on average about
>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>> this expected ? If so, then we are not using a lot of the disk space
>>>> effectively. Is this solved in a future release ?
>>>> 
>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>> do to find an improvement with the current version or upgrading hadoop
>>>> should solve this problem.
>>>> 
>>>> I am happy to provide additional information if needed.
>>>> 
>>>> Thanks for any help.
>>>> 
>>>> -Tapas
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Bertrand Dechoux
>> 
>> 
>> 
> 
> 
> 
> -- 
> Harsh J


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:

> What do you mean that the balancer is always active?

meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.


> It is to be used
> as a tool and it exits once it balances in a specific run (loops until
> it does, but always exits at end). The balancer does balance based on
> usage percentage so that is what you're probably looking for/missing.
> 

May be. How does the balancer look for the usage percentage ?

-Tapas


> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Hi,
>> 
>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>> 
>> Maybe you need to modify the rackware script to make the rack balance, ie,
>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>> large nodes.
>> P.S.
>> you need to reboot the cluster for rackware script modify.
>> 
>> 
>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>> awareness for the cluster, currently it is considered as just one rack. Can
>> that be the problem ? I don't know…
>> 
>> -Tapas
>> 
>> 
>> 
>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>> 
>> And by active, it means that it does actually stops by itself? Else it might
>> mean that the throttling/limit might be an issue with regard to the data
>> volume or velocity.
>> 
>> What threshold is used?
>> 
>> About the small and big datanodes, how are they distributed with regards to
>> racks?
>> About files, how is used the replication factor(s) and block size(s)?
>> 
>> Surely trivial questions again.
>> 
>> Bertrand
>> 
>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> Sorry about that, had it written, but thought it was obvious.
>>> Yes, balancer is active and running on the namenode.
>>> 
>>> -Tapas
>>> 
>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> It is not explicitly said but did you use the balancer?
>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>> 
>>> Regards
>>> 
>>> Bertrand
>>> 
>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>> couple of months, but I would like to understand a couple of things before
>>>> moving towards the upgrade plan.
>>>> 
>>>> We have about 200 datanodes and some of them have larger storage than
>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>> 
>>>> We found that the disk-used percentage is not symmetric through all the
>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>> much lower than that of other nodes with smaller storage space. In larger
>>>> storage nodes the percentage of used disk space varies, but on average about
>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>> this expected ? If so, then we are not using a lot of the disk space
>>>> effectively. Is this solved in a future release ?
>>>> 
>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>> do to find an improvement with the current version or upgrading hadoop
>>>> should solve this problem.
>>>> 
>>>> I am happy to provide additional information if needed.
>>>> 
>>>> Thanks for any help.
>>>> 
>>>> -Tapas
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Bertrand Dechoux
>> 
>> 
>> 
> 
> 
> 
> -- 
> Harsh J


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 18, 2013, at 11:50 PM, Harsh J <ha...@cloudera.com> wrote:

> What do you mean that the balancer is always active?

meaning, the same process is active for a long time. The process that starts may not be exiting at all. We have a cron job set to run it every 10 minutes, but that's not in effect because the process may never exit.


> It is to be used
> as a tool and it exits once it balances in a specific run (loops until
> it does, but always exits at end). The balancer does balance based on
> usage percentage so that is what you're probably looking for/missing.
> 

May be. How does the balancer look for the usage percentage ?

-Tapas


> On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Hi,
>> 
>> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>> 
>> Maybe you need to modify the rackware script to make the rack balance, ie,
>> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
>> large nodes.
>> P.S.
>> you need to reboot the cluster for rackware script modify.
>> 
>> 
>> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
>> awareness for the cluster, currently it is considered as just one rack. Can
>> that be the problem ? I don't know…
>> 
>> -Tapas
>> 
>> 
>> 
>> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>> 
>> And by active, it means that it does actually stops by itself? Else it might
>> mean that the throttling/limit might be an issue with regard to the data
>> volume or velocity.
>> 
>> What threshold is used?
>> 
>> About the small and big datanodes, how are they distributed with regards to
>> racks?
>> About files, how is used the replication factor(s) and block size(s)?
>> 
>> Surely trivial questions again.
>> 
>> Bertrand
>> 
>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> Sorry about that, had it written, but thought it was obvious.
>>> Yes, balancer is active and running on the namenode.
>>> 
>>> -Tapas
>>> 
>>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> It is not explicitly said but did you use the balancer?
>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>> 
>>> Regards
>>> 
>>> Bertrand
>>> 
>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>>> wrote:
>>>> 
>>>> Hello,
>>>> 
>>>> I am using one of the old legacy version (0.20) of hadoop for our
>>>> cluster. We have scheduled for an upgrade to the newer version within a
>>>> couple of months, but I would like to understand a couple of things before
>>>> moving towards the upgrade plan.
>>>> 
>>>> We have about 200 datanodes and some of them have larger storage than
>>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>> 
>>>> We found that the disk-used percentage is not symmetric through all the
>>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>>> much lower than that of other nodes with smaller storage space. In larger
>>>> storage nodes the percentage of used disk space varies, but on average about
>>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>>> this expected ? If so, then we are not using a lot of the disk space
>>>> effectively. Is this solved in a future release ?
>>>> 
>>>> If no, I would like to know  if there are any checks/debugs that one can
>>>> do to find an improvement with the current version or upgrading hadoop
>>>> should solve this problem.
>>>> 
>>>> I am happy to provide additional information if needed.
>>>> 
>>>> Thanks for any help.
>>>> 
>>>> -Tapas
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Bertrand Dechoux
>> 
>> 
>> 
> 
> 
> 
> -- 
> Harsh J


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Harsh J <ha...@cloudera.com>.
What do you mean that the balancer is always active? It is to be used
as a tool and it exits once it balances in a specific run (loops until
it does, but always exits at end). The balancer does balance based on
usage percentage so that is what you're probably looking for/missing.

On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
> Hi,
>
> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>
> Maybe you need to modify the rackware script to make the rack balance, ie,
> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
> large nodes.
> P.S.
> you need to reboot the cluster for rackware script modify.
>
>
> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
> awareness for the cluster, currently it is considered as just one rack. Can
> that be the problem ? I don't know…
>
> -Tapas
>
>
>
> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>
> And by active, it means that it does actually stops by itself? Else it might
> mean that the throttling/limit might be an issue with regard to the data
> volume or velocity.
>
> What threshold is used?
>
> About the small and big datanodes, how are they distributed with regards to
> racks?
> About files, how is used the replication factor(s) and block size(s)?
>
> Surely trivial questions again.
>
> Bertrand
>
> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> Sorry about that, had it written, but thought it was obvious.
>> Yes, balancer is active and running on the namenode.
>>
>> -Tapas
>>
>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>
>> Hi,
>>
>> It is not explicitly said but did you use the balancer?
>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>
>> Regards
>>
>> Bertrand
>>
>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>> wrote:
>>>
>>> Hello,
>>>
>>> I am using one of the old legacy version (0.20) of hadoop for our
>>> cluster. We have scheduled for an upgrade to the newer version within a
>>> couple of months, but I would like to understand a couple of things before
>>> moving towards the upgrade plan.
>>>
>>> We have about 200 datanodes and some of them have larger storage than
>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>
>>> We found that the disk-used percentage is not symmetric through all the
>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>> much lower than that of other nodes with smaller storage space. In larger
>>> storage nodes the percentage of used disk space varies, but on average about
>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>> this expected ? If so, then we are not using a lot of the disk space
>>> effectively. Is this solved in a future release ?
>>>
>>> If no, I would like to know  if there are any checks/debugs that one can
>>> do to find an improvement with the current version or upgrading hadoop
>>> should solve this problem.
>>>
>>> I am happy to provide additional information if needed.
>>>
>>> Thanks for any help.
>>>
>>> -Tapas
>>>
>>
>
>
>
> --
> Bertrand Dechoux
>
>
>



-- 
Harsh J

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Harsh J <ha...@cloudera.com>.
What do you mean that the balancer is always active? It is to be used
as a tool and it exits once it balances in a specific run (loops until
it does, but always exits at end). The balancer does balance based on
usage percentage so that is what you're probably looking for/missing.

On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
> Hi,
>
> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>
> Maybe you need to modify the rackware script to make the rack balance, ie,
> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
> large nodes.
> P.S.
> you need to reboot the cluster for rackware script modify.
>
>
> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
> awareness for the cluster, currently it is considered as just one rack. Can
> that be the problem ? I don't know…
>
> -Tapas
>
>
>
> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>
> And by active, it means that it does actually stops by itself? Else it might
> mean that the throttling/limit might be an issue with regard to the data
> volume or velocity.
>
> What threshold is used?
>
> About the small and big datanodes, how are they distributed with regards to
> racks?
> About files, how is used the replication factor(s) and block size(s)?
>
> Surely trivial questions again.
>
> Bertrand
>
> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> Sorry about that, had it written, but thought it was obvious.
>> Yes, balancer is active and running on the namenode.
>>
>> -Tapas
>>
>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>
>> Hi,
>>
>> It is not explicitly said but did you use the balancer?
>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>
>> Regards
>>
>> Bertrand
>>
>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>> wrote:
>>>
>>> Hello,
>>>
>>> I am using one of the old legacy version (0.20) of hadoop for our
>>> cluster. We have scheduled for an upgrade to the newer version within a
>>> couple of months, but I would like to understand a couple of things before
>>> moving towards the upgrade plan.
>>>
>>> We have about 200 datanodes and some of them have larger storage than
>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>
>>> We found that the disk-used percentage is not symmetric through all the
>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>> much lower than that of other nodes with smaller storage space. In larger
>>> storage nodes the percentage of used disk space varies, but on average about
>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>> this expected ? If so, then we are not using a lot of the disk space
>>> effectively. Is this solved in a future release ?
>>>
>>> If no, I would like to know  if there are any checks/debugs that one can
>>> do to find an improvement with the current version or upgrading hadoop
>>> should solve this problem.
>>>
>>> I am happy to provide additional information if needed.
>>>
>>> Thanks for any help.
>>>
>>> -Tapas
>>>
>>
>
>
>
> --
> Bertrand Dechoux
>
>
>



-- 
Harsh J

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Harsh J <ha...@cloudera.com>.
What do you mean that the balancer is always active? It is to be used
as a tool and it exits once it balances in a specific run (loops until
it does, but always exits at end). The balancer does balance based on
usage percentage so that is what you're probably looking for/missing.

On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
> Hi,
>
> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>
> Maybe you need to modify the rackware script to make the rack balance, ie,
> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
> large nodes.
> P.S.
> you need to reboot the cluster for rackware script modify.
>
>
> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
> awareness for the cluster, currently it is considered as just one rack. Can
> that be the problem ? I don't know…
>
> -Tapas
>
>
>
> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>
> And by active, it means that it does actually stops by itself? Else it might
> mean that the throttling/limit might be an issue with regard to the data
> volume or velocity.
>
> What threshold is used?
>
> About the small and big datanodes, how are they distributed with regards to
> racks?
> About files, how is used the replication factor(s) and block size(s)?
>
> Surely trivial questions again.
>
> Bertrand
>
> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> Sorry about that, had it written, but thought it was obvious.
>> Yes, balancer is active and running on the namenode.
>>
>> -Tapas
>>
>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>
>> Hi,
>>
>> It is not explicitly said but did you use the balancer?
>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>
>> Regards
>>
>> Bertrand
>>
>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>> wrote:
>>>
>>> Hello,
>>>
>>> I am using one of the old legacy version (0.20) of hadoop for our
>>> cluster. We have scheduled for an upgrade to the newer version within a
>>> couple of months, but I would like to understand a couple of things before
>>> moving towards the upgrade plan.
>>>
>>> We have about 200 datanodes and some of them have larger storage than
>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>
>>> We found that the disk-used percentage is not symmetric through all the
>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>> much lower than that of other nodes with smaller storage space. In larger
>>> storage nodes the percentage of used disk space varies, but on average about
>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>> this expected ? If so, then we are not using a lot of the disk space
>>> effectively. Is this solved in a future release ?
>>>
>>> If no, I would like to know  if there are any checks/debugs that one can
>>> do to find an improvement with the current version or upgrading hadoop
>>> should solve this problem.
>>>
>>> I am happy to provide additional information if needed.
>>>
>>> Thanks for any help.
>>>
>>> -Tapas
>>>
>>
>
>
>
> --
> Bertrand Dechoux
>
>
>



-- 
Harsh J

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Harsh J <ha...@cloudera.com>.
What do you mean that the balancer is always active? It is to be used
as a tool and it exits once it balances in a specific run (loops until
it does, but always exits at end). The balancer does balance based on
usage percentage so that is what you're probably looking for/missing.

On Tue, Mar 19, 2013 at 6:56 AM, Tapas Sarangi <ta...@gmail.com> wrote:
> Hi,
>
> On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:
>
> Maybe you need to modify the rackware script to make the rack balance, ie,
> all the racks are the same size,  on rack by 6 small nodes, one rack by 1
> large nodes.
> P.S.
> you need to reboot the cluster for rackware script modify.
>
>
> Like I mentioned earlier in my reply to Bertrand, we haven't considered rack
> awareness for the cluster, currently it is considered as just one rack. Can
> that be the problem ? I don't know…
>
> -Tapas
>
>
>
> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>
> And by active, it means that it does actually stops by itself? Else it might
> mean that the throttling/limit might be an issue with regard to the data
> volume or velocity.
>
> What threshold is used?
>
> About the small and big datanodes, how are they distributed with regards to
> racks?
> About files, how is used the replication factor(s) and block size(s)?
>
> Surely trivial questions again.
>
> Bertrand
>
> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>
> wrote:
>>
>> Hi,
>>
>> Sorry about that, had it written, but thought it was obvious.
>> Yes, balancer is active and running on the namenode.
>>
>> -Tapas
>>
>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>>
>> Hi,
>>
>> It is not explicitly said but did you use the balancer?
>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>
>> Regards
>>
>> Bertrand
>>
>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>
>> wrote:
>>>
>>> Hello,
>>>
>>> I am using one of the old legacy version (0.20) of hadoop for our
>>> cluster. We have scheduled for an upgrade to the newer version within a
>>> couple of months, but I would like to understand a couple of things before
>>> moving towards the upgrade plan.
>>>
>>> We have about 200 datanodes and some of them have larger storage than
>>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>>
>>> We found that the disk-used percentage is not symmetric through all the
>>> datanodes. For larger storage nodes the percentage of disk-space used is
>>> much lower than that of other nodes with smaller storage space. In larger
>>> storage nodes the percentage of used disk space varies, but on average about
>>> 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
>>> this expected ? If so, then we are not using a lot of the disk space
>>> effectively. Is this solved in a future release ?
>>>
>>> If no, I would like to know  if there are any checks/debugs that one can
>>> do to find an improvement with the current version or upgrading hadoop
>>> should solve this problem.
>>>
>>> I am happy to provide additional information if needed.
>>>
>>> Thanks for any help.
>>>
>>> -Tapas
>>>
>>
>
>
>
> --
> Bertrand Dechoux
>
>
>



-- 
Harsh J

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:

> Maybe you need to modify the rackware script to make the rack balance, ie, all the racks are the same size,  on rack by 6 small nodes, one rack by 1 large nodes. 
> P.S.
> you need to reboot the cluster for rackware script modify.

Like I mentioned earlier in my reply to Bertrand, we haven't considered rack awareness for the cluster, currently it is considered as just one rack. Can that be the problem ? I don't know…

-Tapas


>   
> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>> And by active, it means that it does actually stops by itself? Else it might mean that the throttling/limit might be an issue with regard to the data volume or velocity.
>> 
>> What threshold is used?
>> 
>> About the small and big datanodes, how are they distributed with regards to racks?
>> About files, how is used the replication factor(s) and block size(s)?
>> 
>> Surely trivial questions again.
>> 
>> Bertrand
>> 
>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Hi,
>> 
>> Sorry about that, had it written, but thought it was obvious. 
>> Yes, balancer is active and running on the namenode.
>> 
>> -Tapas
>> 
>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> It is not explicitly said but did you use the balancer?
>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>> 
>>> Regards
>>> 
>>> Bertrand
>>> 
>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Hello,
>>> 
>>> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
>>> 
>>> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
>>> 
>>> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
>>> 
>>> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
>>> 
>>> I am happy to provide additional information if needed.
>>> 
>>> Thanks for any help.
>>> 
>>> -Tapas
>>> 
>> 
>> 
>> 
>> 
>> -- 
>> Bertrand Dechoux
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:

> Maybe you need to modify the rackware script to make the rack balance, ie, all the racks are the same size,  on rack by 6 small nodes, one rack by 1 large nodes. 
> P.S.
> you need to reboot the cluster for rackware script modify.

Like I mentioned earlier in my reply to Bertrand, we haven't considered rack awareness for the cluster, currently it is considered as just one rack. Can that be the problem ? I don't know…

-Tapas


>   
> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>> And by active, it means that it does actually stops by itself? Else it might mean that the throttling/limit might be an issue with regard to the data volume or velocity.
>> 
>> What threshold is used?
>> 
>> About the small and big datanodes, how are they distributed with regards to racks?
>> About files, how is used the replication factor(s) and block size(s)?
>> 
>> Surely trivial questions again.
>> 
>> Bertrand
>> 
>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Hi,
>> 
>> Sorry about that, had it written, but thought it was obvious. 
>> Yes, balancer is active and running on the namenode.
>> 
>> -Tapas
>> 
>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> It is not explicitly said but did you use the balancer?
>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>> 
>>> Regards
>>> 
>>> Bertrand
>>> 
>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Hello,
>>> 
>>> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
>>> 
>>> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
>>> 
>>> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
>>> 
>>> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
>>> 
>>> I am happy to provide additional information if needed.
>>> 
>>> Thanks for any help.
>>> 
>>> -Tapas
>>> 
>> 
>> 
>> 
>> 
>> -- 
>> Bertrand Dechoux
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:

> Maybe you need to modify the rackware script to make the rack balance, ie, all the racks are the same size,  on rack by 6 small nodes, one rack by 1 large nodes. 
> P.S.
> you need to reboot the cluster for rackware script modify.

Like I mentioned earlier in my reply to Bertrand, we haven't considered rack awareness for the cluster, currently it is considered as just one rack. Can that be the problem ? I don't know…

-Tapas


>   
> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>> And by active, it means that it does actually stops by itself? Else it might mean that the throttling/limit might be an issue with regard to the data volume or velocity.
>> 
>> What threshold is used?
>> 
>> About the small and big datanodes, how are they distributed with regards to racks?
>> About files, how is used the replication factor(s) and block size(s)?
>> 
>> Surely trivial questions again.
>> 
>> Bertrand
>> 
>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Hi,
>> 
>> Sorry about that, had it written, but thought it was obvious. 
>> Yes, balancer is active and running on the namenode.
>> 
>> -Tapas
>> 
>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> It is not explicitly said but did you use the balancer?
>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>> 
>>> Regards
>>> 
>>> Bertrand
>>> 
>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Hello,
>>> 
>>> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
>>> 
>>> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
>>> 
>>> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
>>> 
>>> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
>>> 
>>> I am happy to provide additional information if needed.
>>> 
>>> Thanks for any help.
>>> 
>>> -Tapas
>>> 
>> 
>> 
>> 
>> 
>> -- 
>> Bertrand Dechoux
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

On Mar 18, 2013, at 8:21 PM, 李洪忠 <lh...@hotmail.com> wrote:

> Maybe you need to modify the rackware script to make the rack balance, ie, all the racks are the same size,  on rack by 6 small nodes, one rack by 1 large nodes. 
> P.S.
> you need to reboot the cluster for rackware script modify.

Like I mentioned earlier in my reply to Bertrand, we haven't considered rack awareness for the cluster, currently it is considered as just one rack. Can that be the problem ? I don't know…

-Tapas


>   
> 于 2013/3/19 7:17, Bertrand Dechoux 写道:
>> And by active, it means that it does actually stops by itself? Else it might mean that the throttling/limit might be an issue with regard to the data volume or velocity.
>> 
>> What threshold is used?
>> 
>> About the small and big datanodes, how are they distributed with regards to racks?
>> About files, how is used the replication factor(s) and block size(s)?
>> 
>> Surely trivial questions again.
>> 
>> Bertrand
>> 
>> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Hi,
>> 
>> Sorry about that, had it written, but thought it was obvious. 
>> Yes, balancer is active and running on the namenode.
>> 
>> -Tapas
>> 
>> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> It is not explicitly said but did you use the balancer?
>>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>> 
>>> Regards
>>> 
>>> Bertrand
>>> 
>>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Hello,
>>> 
>>> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
>>> 
>>> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
>>> 
>>> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
>>> 
>>> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
>>> 
>>> I am happy to provide additional information if needed.
>>> 
>>> Thanks for any help.
>>> 
>>> -Tapas
>>> 
>> 
>> 
>> 
>> 
>> -- 
>> Bertrand Dechoux
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by 李洪忠 <lh...@hotmail.com>.
Maybe you need to modify the rackware script to make the rack balance, 
ie, all the racks are the same size,  on rack by 6 small nodes, one rack 
by 1 large nodes.
P.S.
you need to reboot the cluster for rackware script modify.

于 2013/3/19 7:17, Bertrand Dechoux 写道:
> And by active, it means that it does actually stops by itself? Else it 
> might mean that the throttling/limit might be an issue with regard to 
> the data volume or velocity.
>
> What threshold is used?
>
> About the small and big datanodes, how are they distributed with 
> regards to racks?
> About files, how is used the replication factor(s) and block size(s)?
>
> Surely trivial questions again.
>
> Bertrand
>
> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi 
> <tapas.sarangi@gmail.com <ma...@gmail.com>> wrote:
>
>     Hi,
>
>     Sorry about that, had it written, but thought it was obvious.
>     Yes, balancer is active and running on the namenode.
>
>     -Tapas
>
>     On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <dechouxb@gmail.com
>     <ma...@gmail.com>> wrote:
>
>>     Hi,
>>
>>     It is not explicitly said but did you use the balancer?
>>     http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>
>>     Regards
>>
>>     Bertrand
>>
>>     On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi
>>     <tapas.sarangi@gmail.com <ma...@gmail.com>> wrote:
>>
>>         Hello,
>>
>>         I am using one of the old legacy version (0.20) of hadoop for
>>         our cluster. We have scheduled for an upgrade to the newer
>>         version within a couple of months, but I would like to
>>         understand a couple of things before moving towards the
>>         upgrade plan.
>>
>>         We have about 200 datanodes and some of them have larger
>>         storage than others. The storage for the datanodes varies
>>         between 12 TB to 72 TB.
>>
>>         We found that the disk-used percentage is not symmetric
>>         through all the datanodes. For larger storage nodes the
>>         percentage of disk-space used is much lower than that of
>>         other nodes with smaller storage space. In larger storage
>>         nodes the percentage of used disk space varies, but on
>>         average about 30-50%. For the smaller storage nodes this
>>         number is as high as 99.9%. Is this expected ? If so, then we
>>         are not using a lot of the disk space effectively. Is this
>>         solved in a future release ?
>>
>>         If no, I would like to know  if there are any checks/debugs
>>         that one can do to find an improvement with the current
>>         version or upgrading hadoop should solve this problem.
>>
>>         I am happy to provide additional information if needed.
>>
>>         Thanks for any help.
>>
>>         -Tapas
>>
>
>
>
>
> -- 
> Bertrand Dechoux 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by 李洪忠 <lh...@hotmail.com>.
Maybe you need to modify the rackware script to make the rack balance, 
ie, all the racks are the same size,  on rack by 6 small nodes, one rack 
by 1 large nodes.
P.S.
you need to reboot the cluster for rackware script modify.

于 2013/3/19 7:17, Bertrand Dechoux 写道:
> And by active, it means that it does actually stops by itself? Else it 
> might mean that the throttling/limit might be an issue with regard to 
> the data volume or velocity.
>
> What threshold is used?
>
> About the small and big datanodes, how are they distributed with 
> regards to racks?
> About files, how is used the replication factor(s) and block size(s)?
>
> Surely trivial questions again.
>
> Bertrand
>
> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi 
> <tapas.sarangi@gmail.com <ma...@gmail.com>> wrote:
>
>     Hi,
>
>     Sorry about that, had it written, but thought it was obvious.
>     Yes, balancer is active and running on the namenode.
>
>     -Tapas
>
>     On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <dechouxb@gmail.com
>     <ma...@gmail.com>> wrote:
>
>>     Hi,
>>
>>     It is not explicitly said but did you use the balancer?
>>     http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>
>>     Regards
>>
>>     Bertrand
>>
>>     On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi
>>     <tapas.sarangi@gmail.com <ma...@gmail.com>> wrote:
>>
>>         Hello,
>>
>>         I am using one of the old legacy version (0.20) of hadoop for
>>         our cluster. We have scheduled for an upgrade to the newer
>>         version within a couple of months, but I would like to
>>         understand a couple of things before moving towards the
>>         upgrade plan.
>>
>>         We have about 200 datanodes and some of them have larger
>>         storage than others. The storage for the datanodes varies
>>         between 12 TB to 72 TB.
>>
>>         We found that the disk-used percentage is not symmetric
>>         through all the datanodes. For larger storage nodes the
>>         percentage of disk-space used is much lower than that of
>>         other nodes with smaller storage space. In larger storage
>>         nodes the percentage of used disk space varies, but on
>>         average about 30-50%. For the smaller storage nodes this
>>         number is as high as 99.9%. Is this expected ? If so, then we
>>         are not using a lot of the disk space effectively. Is this
>>         solved in a future release ?
>>
>>         If no, I would like to know  if there are any checks/debugs
>>         that one can do to find an improvement with the current
>>         version or upgrading hadoop should solve this problem.
>>
>>         I am happy to provide additional information if needed.
>>
>>         Thanks for any help.
>>
>>         -Tapas
>>
>
>
>
>
> -- 
> Bertrand Dechoux 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 18, 2013, at 6:17 PM, Bertrand Dechoux <de...@gmail.com> wrote:

> And by active, it means that it does actually stops by itself?
> Else it might mean that the throttling/limit might be an issue with regard to the data volume or velocity.
> 

This "else" is probably what's happening. I just checked the logs. Its active almost all the time. 


> What threshold is used?

Don't know what's this. How can I find out ?

> 
> About the small and big datanodes, how are they distributed with regards to racks?

We haven't considered rack awareness for our cluster. It is currently considered as one rack. I am going through some docs to figure out how I can implement this after the upgrade.

> About files, how is used the replication factor(s) and block size(s)?

This is 2.

> 
> Surely trivial questions again.
> 

Not really :)

Thanks
-Tapas


> Bertrand
> 
> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Hi,
> 
> Sorry about that, had it written, but thought it was obvious. 
> Yes, balancer is active and running on the namenode.
> 
> -Tapas
> 
> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> 
>> Hi,
>> 
>> It is not explicitly said but did you use the balancer?
>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>> 
>> Regards
>> 
>> Bertrand
>> 
>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Hello,
>> 
>> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
>> 
>> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
>> 
>> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
>> 
>> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
>> 
>> I am happy to provide additional information if needed.
>> 
>> Thanks for any help.
>> 
>> -Tapas
>> 
> 
> 
> 
> 
> -- 
> Bertrand Dechoux


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by 李洪忠 <lh...@hotmail.com>.
Maybe you need to modify the rackware script to make the rack balance, 
ie, all the racks are the same size,  on rack by 6 small nodes, one rack 
by 1 large nodes.
P.S.
you need to reboot the cluster for rackware script modify.

于 2013/3/19 7:17, Bertrand Dechoux 写道:
> And by active, it means that it does actually stops by itself? Else it 
> might mean that the throttling/limit might be an issue with regard to 
> the data volume or velocity.
>
> What threshold is used?
>
> About the small and big datanodes, how are they distributed with 
> regards to racks?
> About files, how is used the replication factor(s) and block size(s)?
>
> Surely trivial questions again.
>
> Bertrand
>
> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi 
> <tapas.sarangi@gmail.com <ma...@gmail.com>> wrote:
>
>     Hi,
>
>     Sorry about that, had it written, but thought it was obvious.
>     Yes, balancer is active and running on the namenode.
>
>     -Tapas
>
>     On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <dechouxb@gmail.com
>     <ma...@gmail.com>> wrote:
>
>>     Hi,
>>
>>     It is not explicitly said but did you use the balancer?
>>     http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>
>>     Regards
>>
>>     Bertrand
>>
>>     On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi
>>     <tapas.sarangi@gmail.com <ma...@gmail.com>> wrote:
>>
>>         Hello,
>>
>>         I am using one of the old legacy version (0.20) of hadoop for
>>         our cluster. We have scheduled for an upgrade to the newer
>>         version within a couple of months, but I would like to
>>         understand a couple of things before moving towards the
>>         upgrade plan.
>>
>>         We have about 200 datanodes and some of them have larger
>>         storage than others. The storage for the datanodes varies
>>         between 12 TB to 72 TB.
>>
>>         We found that the disk-used percentage is not symmetric
>>         through all the datanodes. For larger storage nodes the
>>         percentage of disk-space used is much lower than that of
>>         other nodes with smaller storage space. In larger storage
>>         nodes the percentage of used disk space varies, but on
>>         average about 30-50%. For the smaller storage nodes this
>>         number is as high as 99.9%. Is this expected ? If so, then we
>>         are not using a lot of the disk space effectively. Is this
>>         solved in a future release ?
>>
>>         If no, I would like to know  if there are any checks/debugs
>>         that one can do to find an improvement with the current
>>         version or upgrading hadoop should solve this problem.
>>
>>         I am happy to provide additional information if needed.
>>
>>         Thanks for any help.
>>
>>         -Tapas
>>
>
>
>
>
> -- 
> Bertrand Dechoux 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 18, 2013, at 6:17 PM, Bertrand Dechoux <de...@gmail.com> wrote:

> And by active, it means that it does actually stops by itself?
> Else it might mean that the throttling/limit might be an issue with regard to the data volume or velocity.
> 

This "else" is probably what's happening. I just checked the logs. Its active almost all the time. 


> What threshold is used?

Don't know what's this. How can I find out ?

> 
> About the small and big datanodes, how are they distributed with regards to racks?

We haven't considered rack awareness for our cluster. It is currently considered as one rack. I am going through some docs to figure out how I can implement this after the upgrade.

> About files, how is used the replication factor(s) and block size(s)?

This is 2.

> 
> Surely trivial questions again.
> 

Not really :)

Thanks
-Tapas


> Bertrand
> 
> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Hi,
> 
> Sorry about that, had it written, but thought it was obvious. 
> Yes, balancer is active and running on the namenode.
> 
> -Tapas
> 
> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> 
>> Hi,
>> 
>> It is not explicitly said but did you use the balancer?
>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>> 
>> Regards
>> 
>> Bertrand
>> 
>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Hello,
>> 
>> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
>> 
>> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
>> 
>> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
>> 
>> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
>> 
>> I am happy to provide additional information if needed.
>> 
>> Thanks for any help.
>> 
>> -Tapas
>> 
> 
> 
> 
> 
> -- 
> Bertrand Dechoux


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by 李洪忠 <lh...@hotmail.com>.
Maybe you need to modify the rackware script to make the rack balance, 
ie, all the racks are the same size,  on rack by 6 small nodes, one rack 
by 1 large nodes.
P.S.
you need to reboot the cluster for rackware script modify.

于 2013/3/19 7:17, Bertrand Dechoux 写道:
> And by active, it means that it does actually stops by itself? Else it 
> might mean that the throttling/limit might be an issue with regard to 
> the data volume or velocity.
>
> What threshold is used?
>
> About the small and big datanodes, how are they distributed with 
> regards to racks?
> About files, how is used the replication factor(s) and block size(s)?
>
> Surely trivial questions again.
>
> Bertrand
>
> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi 
> <tapas.sarangi@gmail.com <ma...@gmail.com>> wrote:
>
>     Hi,
>
>     Sorry about that, had it written, but thought it was obvious.
>     Yes, balancer is active and running on the namenode.
>
>     -Tapas
>
>     On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <dechouxb@gmail.com
>     <ma...@gmail.com>> wrote:
>
>>     Hi,
>>
>>     It is not explicitly said but did you use the balancer?
>>     http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>>
>>     Regards
>>
>>     Bertrand
>>
>>     On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi
>>     <tapas.sarangi@gmail.com <ma...@gmail.com>> wrote:
>>
>>         Hello,
>>
>>         I am using one of the old legacy version (0.20) of hadoop for
>>         our cluster. We have scheduled for an upgrade to the newer
>>         version within a couple of months, but I would like to
>>         understand a couple of things before moving towards the
>>         upgrade plan.
>>
>>         We have about 200 datanodes and some of them have larger
>>         storage than others. The storage for the datanodes varies
>>         between 12 TB to 72 TB.
>>
>>         We found that the disk-used percentage is not symmetric
>>         through all the datanodes. For larger storage nodes the
>>         percentage of disk-space used is much lower than that of
>>         other nodes with smaller storage space. In larger storage
>>         nodes the percentage of used disk space varies, but on
>>         average about 30-50%. For the smaller storage nodes this
>>         number is as high as 99.9%. Is this expected ? If so, then we
>>         are not using a lot of the disk space effectively. Is this
>>         solved in a future release ?
>>
>>         If no, I would like to know  if there are any checks/debugs
>>         that one can do to find an improvement with the current
>>         version or upgrading hadoop should solve this problem.
>>
>>         I am happy to provide additional information if needed.
>>
>>         Thanks for any help.
>>
>>         -Tapas
>>
>
>
>
>
> -- 
> Bertrand Dechoux 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 18, 2013, at 6:17 PM, Bertrand Dechoux <de...@gmail.com> wrote:

> And by active, it means that it does actually stops by itself?
> Else it might mean that the throttling/limit might be an issue with regard to the data volume or velocity.
> 

This "else" is probably what's happening. I just checked the logs. Its active almost all the time. 


> What threshold is used?

Don't know what's this. How can I find out ?

> 
> About the small and big datanodes, how are they distributed with regards to racks?

We haven't considered rack awareness for our cluster. It is currently considered as one rack. I am going through some docs to figure out how I can implement this after the upgrade.

> About files, how is used the replication factor(s) and block size(s)?

This is 2.

> 
> Surely trivial questions again.
> 

Not really :)

Thanks
-Tapas


> Bertrand
> 
> On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Hi,
> 
> Sorry about that, had it written, but thought it was obvious. 
> Yes, balancer is active and running on the namenode.
> 
> -Tapas
> 
> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
> 
>> Hi,
>> 
>> It is not explicitly said but did you use the balancer?
>> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>> 
>> Regards
>> 
>> Bertrand
>> 
>> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Hello,
>> 
>> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
>> 
>> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
>> 
>> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
>> 
>> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
>> 
>> I am happy to provide additional information if needed.
>> 
>> Thanks for any help.
>> 
>> -Tapas
>> 
> 
> 
> 
> 
> -- 
> Bertrand Dechoux


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Bertrand Dechoux <de...@gmail.com>.
And by active, it means that it does actually stops by itself? Else it
might mean that the throttling/limit might be an issue with regard to the
data volume or velocity.

What threshold is used?

About the small and big datanodes, how are they distributed with regards to
racks?
About files, how is used the replication factor(s) and block size(s)?

Surely trivial questions again.

Bertrand

On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hi,
>
> Sorry about that, had it written, but thought it was obvious.
> Yes, balancer is active and running on the namenode.
>
> -Tapas
>
> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>
> Hi,
>
> It is not explicitly said but did you use the balancer?
> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>
> Regards
>
> Bertrand
>
> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Hello,
>>
>> I am using one of the old legacy version (0.20) of hadoop for our
>> cluster. We have scheduled for an upgrade to the newer version within a
>> couple of months, but I would like to understand a couple of things before
>> moving towards the upgrade plan.
>>
>> We have about 200 datanodes and some of them have larger storage than
>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>
>> We found that the disk-used percentage is not symmetric through all the
>> datanodes. For larger storage nodes the percentage of disk-space used is
>> much lower than that of other nodes with smaller storage space. In larger
>> storage nodes the percentage of used disk space varies, but on average
>> about 30-50%. For the smaller storage nodes this number is as high as
>> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
>> space effectively. Is this solved in a future release ?
>>
>> If no, I would like to know  if there are any checks/debugs that one can
>> do to find an improvement with the current version or upgrading hadoop
>> should solve this problem.
>>
>> I am happy to provide additional information if needed.
>>
>> Thanks for any help.
>>
>> -Tapas
>>
>>
>


-- 
Bertrand Dechoux

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Bertrand Dechoux <de...@gmail.com>.
And by active, it means that it does actually stops by itself? Else it
might mean that the throttling/limit might be an issue with regard to the
data volume or velocity.

What threshold is used?

About the small and big datanodes, how are they distributed with regards to
racks?
About files, how is used the replication factor(s) and block size(s)?

Surely trivial questions again.

Bertrand

On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hi,
>
> Sorry about that, had it written, but thought it was obvious.
> Yes, balancer is active and running on the namenode.
>
> -Tapas
>
> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>
> Hi,
>
> It is not explicitly said but did you use the balancer?
> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>
> Regards
>
> Bertrand
>
> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Hello,
>>
>> I am using one of the old legacy version (0.20) of hadoop for our
>> cluster. We have scheduled for an upgrade to the newer version within a
>> couple of months, but I would like to understand a couple of things before
>> moving towards the upgrade plan.
>>
>> We have about 200 datanodes and some of them have larger storage than
>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>
>> We found that the disk-used percentage is not symmetric through all the
>> datanodes. For larger storage nodes the percentage of disk-space used is
>> much lower than that of other nodes with smaller storage space. In larger
>> storage nodes the percentage of used disk space varies, but on average
>> about 30-50%. For the smaller storage nodes this number is as high as
>> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
>> space effectively. Is this solved in a future release ?
>>
>> If no, I would like to know  if there are any checks/debugs that one can
>> do to find an improvement with the current version or upgrading hadoop
>> should solve this problem.
>>
>> I am happy to provide additional information if needed.
>>
>> Thanks for any help.
>>
>> -Tapas
>>
>>
>


-- 
Bertrand Dechoux

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Bertrand Dechoux <de...@gmail.com>.
And by active, it means that it does actually stops by itself? Else it
might mean that the throttling/limit might be an issue with regard to the
data volume or velocity.

What threshold is used?

About the small and big datanodes, how are they distributed with regards to
racks?
About files, how is used the replication factor(s) and block size(s)?

Surely trivial questions again.

Bertrand

On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hi,
>
> Sorry about that, had it written, but thought it was obvious.
> Yes, balancer is active and running on the namenode.
>
> -Tapas
>
> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>
> Hi,
>
> It is not explicitly said but did you use the balancer?
> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>
> Regards
>
> Bertrand
>
> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Hello,
>>
>> I am using one of the old legacy version (0.20) of hadoop for our
>> cluster. We have scheduled for an upgrade to the newer version within a
>> couple of months, but I would like to understand a couple of things before
>> moving towards the upgrade plan.
>>
>> We have about 200 datanodes and some of them have larger storage than
>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>
>> We found that the disk-used percentage is not symmetric through all the
>> datanodes. For larger storage nodes the percentage of disk-space used is
>> much lower than that of other nodes with smaller storage space. In larger
>> storage nodes the percentage of used disk space varies, but on average
>> about 30-50%. For the smaller storage nodes this number is as high as
>> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
>> space effectively. Is this solved in a future release ?
>>
>> If no, I would like to know  if there are any checks/debugs that one can
>> do to find an improvement with the current version or upgrading hadoop
>> should solve this problem.
>>
>> I am happy to provide additional information if needed.
>>
>> Thanks for any help.
>>
>> -Tapas
>>
>>
>


-- 
Bertrand Dechoux

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Bertrand Dechoux <de...@gmail.com>.
And by active, it means that it does actually stops by itself? Else it
might mean that the throttling/limit might be an issue with regard to the
data volume or velocity.

What threshold is used?

About the small and big datanodes, how are they distributed with regards to
racks?
About files, how is used the replication factor(s) and block size(s)?

Surely trivial questions again.

Bertrand

On Mon, Mar 18, 2013 at 10:46 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hi,
>
> Sorry about that, had it written, but thought it was obvious.
> Yes, balancer is active and running on the namenode.
>
> -Tapas
>
> On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:
>
> Hi,
>
> It is not explicitly said but did you use the balancer?
> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
>
> Regards
>
> Bertrand
>
> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Hello,
>>
>> I am using one of the old legacy version (0.20) of hadoop for our
>> cluster. We have scheduled for an upgrade to the newer version within a
>> couple of months, but I would like to understand a couple of things before
>> moving towards the upgrade plan.
>>
>> We have about 200 datanodes and some of them have larger storage than
>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>
>> We found that the disk-used percentage is not symmetric through all the
>> datanodes. For larger storage nodes the percentage of disk-space used is
>> much lower than that of other nodes with smaller storage space. In larger
>> storage nodes the percentage of used disk space varies, but on average
>> about 30-50%. For the smaller storage nodes this number is as high as
>> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
>> space effectively. Is this solved in a future release ?
>>
>> If no, I would like to know  if there are any checks/debugs that one can
>> do to find an improvement with the current version or upgrading hadoop
>> should solve this problem.
>>
>> I am happy to provide additional information if needed.
>>
>> Thanks for any help.
>>
>> -Tapas
>>
>>
>


-- 
Bertrand Dechoux

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

Sorry about that, had it written, but thought it was obvious. 
Yes, balancer is active and running on the namenode.

-Tapas

On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:

> Hi,
> 
> It is not explicitly said but did you use the balancer?
> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
> 
> Regards
> 
> Bertrand
> 
> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Hello,
> 
> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
> 
> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
> 
> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
> 
> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
> 
> I am happy to provide additional information if needed.
> 
> Thanks for any help.
> 
> -Tapas
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

Sorry about that, had it written, but thought it was obvious. 
Yes, balancer is active and running on the namenode.

-Tapas

On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:

> Hi,
> 
> It is not explicitly said but did you use the balancer?
> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
> 
> Regards
> 
> Bertrand
> 
> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Hello,
> 
> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
> 
> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
> 
> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
> 
> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
> 
> I am happy to provide additional information if needed.
> 
> Thanks for any help.
> 
> -Tapas
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

Sorry about that, had it written, but thought it was obvious. 
Yes, balancer is active and running on the namenode.

-Tapas

On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:

> Hi,
> 
> It is not explicitly said but did you use the balancer?
> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
> 
> Regards
> 
> Bertrand
> 
> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Hello,
> 
> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
> 
> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
> 
> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
> 
> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
> 
> I am happy to provide additional information if needed.
> 
> Thanks for any help.
> 
> -Tapas
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

Sorry about that, had it written, but thought it was obvious. 
Yes, balancer is active and running on the namenode.

-Tapas

On Mar 18, 2013, at 4:43 PM, Bertrand Dechoux <de...@gmail.com> wrote:

> Hi,
> 
> It is not explicitly said but did you use the balancer?
> http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer
> 
> Regards
> 
> Bertrand
> 
> On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Hello,
> 
> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
> 
> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
> 
> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
> 
> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
> 
> I am happy to provide additional information if needed.
> 
> Thanks for any help.
> 
> -Tapas
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Bertrand Dechoux <de...@gmail.com>.
Hi,

It is not explicitly said but did you use the balancer?
http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer

Regards

Bertrand

On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hello,
>
> I am using one of the old legacy version (0.20) of hadoop for our cluster.
> We have scheduled for an upgrade to the newer version within a couple of
> months, but I would like to understand a couple of things before moving
> towards the upgrade plan.
>
> We have about 200 datanodes and some of them have larger storage than
> others. The storage for the datanodes varies between 12 TB to 72 TB.
>
> We found that the disk-used percentage is not symmetric through all the
> datanodes. For larger storage nodes the percentage of disk-space used is
> much lower than that of other nodes with smaller storage space. In larger
> storage nodes the percentage of used disk space varies, but on average
> about 30-50%. For the smaller storage nodes this number is as high as
> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
> space effectively. Is this solved in a future release ?
>
> If no, I would like to know  if there are any checks/debugs that one can
> do to find an improvement with the current version or upgrading hadoop
> should solve this problem.
>
> I am happy to provide additional information if needed.
>
> Thanks for any help.
>
> -Tapas
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
On Mon, Mar 25, 2013 at 4:29 AM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hi,
>
> Thanks for the explanation. Where can I find the java code for balancer
> that utilizes the threshold value and calculate it myself as you mentioned
> ? I think I understand your calculation, but would like to see the code.
>

src/hdfs/org/apache/hadoop/hdfs/server/balancer/Balancer.java

see BalancerDatanode


> If I set the threshold to 5 instead of 10, then the smaller nodes will
> have a maximum of 95% full where the larger nodes disk-usage will increase
> from 80% to 85%.
>
> Now my question to you and the experts is when I run the balancer, is the
> following command enough to set the threshold to a different value :
>
> hadoop balancer -threshold 5
>
yes

>
> Thanks to all for the suggestions...
>
> -------
>
>
>
> today i thought about my advice for you and i have understood that i wrong.
>
> for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all
> node have 10 Tb data.
> averege cluster dfs used 1000/2600*100=38.5
>
> for  12Tb node dfs used it is 83.3 from capacity
> for 72Tb nodes its 13.9.
>
> node is balanced if      averege cluster dfs used +threshold > node dfs
> used >averege cluster dfs used - threshold.
> data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of
> capacity balancer will stop.
> In this time 72tb node have 36.1 % of capacity.
>
> the cluster will grow up,in ideal case when cluster dfs used capacity 90 %
> .72Tb nodes will about 80% of capacity and 12Tb have  about 100 %
> capacity.After that you have about 288Tb freespace
>
>
>
>
>
>
>
>
>
>
>
>
>>
>>
>> -----
>>
>>
>>
>>
>> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
On Mon, Mar 25, 2013 at 4:29 AM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hi,
>
> Thanks for the explanation. Where can I find the java code for balancer
> that utilizes the threshold value and calculate it myself as you mentioned
> ? I think I understand your calculation, but would like to see the code.
>

src/hdfs/org/apache/hadoop/hdfs/server/balancer/Balancer.java

see BalancerDatanode


> If I set the threshold to 5 instead of 10, then the smaller nodes will
> have a maximum of 95% full where the larger nodes disk-usage will increase
> from 80% to 85%.
>
> Now my question to you and the experts is when I run the balancer, is the
> following command enough to set the threshold to a different value :
>
> hadoop balancer -threshold 5
>
yes

>
> Thanks to all for the suggestions...
>
> -------
>
>
>
> today i thought about my advice for you and i have understood that i wrong.
>
> for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all
> node have 10 Tb data.
> averege cluster dfs used 1000/2600*100=38.5
>
> for  12Tb node dfs used it is 83.3 from capacity
> for 72Tb nodes its 13.9.
>
> node is balanced if      averege cluster dfs used +threshold > node dfs
> used >averege cluster dfs used - threshold.
> data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of
> capacity balancer will stop.
> In this time 72tb node have 36.1 % of capacity.
>
> the cluster will grow up,in ideal case when cluster dfs used capacity 90 %
> .72Tb nodes will about 80% of capacity and 12Tb have  about 100 %
> capacity.After that you have about 288Tb freespace
>
>
>
>
>
>
>
>
>
>
>
>
>>
>>
>> -----
>>
>>
>>
>>
>> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
On Mon, Mar 25, 2013 at 4:29 AM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hi,
>
> Thanks for the explanation. Where can I find the java code for balancer
> that utilizes the threshold value and calculate it myself as you mentioned
> ? I think I understand your calculation, but would like to see the code.
>

src/hdfs/org/apache/hadoop/hdfs/server/balancer/Balancer.java

see BalancerDatanode


> If I set the threshold to 5 instead of 10, then the smaller nodes will
> have a maximum of 95% full where the larger nodes disk-usage will increase
> from 80% to 85%.
>
> Now my question to you and the experts is when I run the balancer, is the
> following command enough to set the threshold to a different value :
>
> hadoop balancer -threshold 5
>
yes

>
> Thanks to all for the suggestions...
>
> -------
>
>
>
> today i thought about my advice for you and i have understood that i wrong.
>
> for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all
> node have 10 Tb data.
> averege cluster dfs used 1000/2600*100=38.5
>
> for  12Tb node dfs used it is 83.3 from capacity
> for 72Tb nodes its 13.9.
>
> node is balanced if      averege cluster dfs used +threshold > node dfs
> used >averege cluster dfs used - threshold.
> data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of
> capacity balancer will stop.
> In this time 72tb node have 36.1 % of capacity.
>
> the cluster will grow up,in ideal case when cluster dfs used capacity 90 %
> .72Tb nodes will about 80% of capacity and 12Tb have  about 100 %
> capacity.After that you have about 288Tb freespace
>
>
>
>
>
>
>
>
>
>
>
>
>>
>>
>> -----
>>
>>
>>
>>
>> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
On Mon, Mar 25, 2013 at 4:29 AM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hi,
>
> Thanks for the explanation. Where can I find the java code for balancer
> that utilizes the threshold value and calculate it myself as you mentioned
> ? I think I understand your calculation, but would like to see the code.
>

src/hdfs/org/apache/hadoop/hdfs/server/balancer/Balancer.java

see BalancerDatanode


> If I set the threshold to 5 instead of 10, then the smaller nodes will
> have a maximum of 95% full where the larger nodes disk-usage will increase
> from 80% to 85%.
>
> Now my question to you and the experts is when I run the balancer, is the
> following command enough to set the threshold to a different value :
>
> hadoop balancer -threshold 5
>
yes

>
> Thanks to all for the suggestions...
>
> -------
>
>
>
> today i thought about my advice for you and i have understood that i wrong.
>
> for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all
> node have 10 Tb data.
> averege cluster dfs used 1000/2600*100=38.5
>
> for  12Tb node dfs used it is 83.3 from capacity
> for 72Tb nodes its 13.9.
>
> node is balanced if      averege cluster dfs used +threshold > node dfs
> used >averege cluster dfs used - threshold.
> data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of
> capacity balancer will stop.
> In this time 72tb node have 36.1 % of capacity.
>
> the cluster will grow up,in ideal case when cluster dfs used capacity 90 %
> .72Tb nodes will about 80% of capacity and 12Tb have  about 100 %
> capacity.After that you have about 288Tb freespace
>
>
>
>
>
>
>
>
>
>
>
>
>>
>>
>> -----
>>
>>
>>
>>
>> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

Thanks for the explanation. Where can I find the java code for balancer that utilizes the threshold value and calculate it myself as you mentioned ? I think I understand your calculation, but would like to see the code. 
If I set the threshold to 5 instead of 10, then the smaller nodes will have a maximum of 95% full where the larger nodes disk-usage will increase from 80% to 85%.

Now my question to you and the experts is when I run the balancer, is the following command enough to set the threshold to a different value :

hadoop balancer -threshold 5
 
Thanks to all for the suggestions...

-------


> 
> today i thought about my advice for you and i have understood that i wrong.
> 
> for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all node have 10 Tb data.
> averege cluster dfs used 1000/2600*100=38.5
> 
> for  12Tb node dfs used it is 83.3 from capacity
> for 72Tb nodes its 13.9.
> 
> node is balanced if      averege cluster dfs used +threshold > node dfs used >averege cluster dfs used - threshold.
> data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of capacity balancer will stop.
> In this time 72tb node have 36.1 % of capacity.
> 
> the cluster will grow up,in ideal case when cluster dfs used capacity 90 % .72Tb nodes will about 80% of capacity and 12Tb have  about 100 % capacity.After that you have about 288Tb freespace



> 
> 
> 
> 
> 
> 
>  
> 
> 
> -----
> 
> 
> 
> 
>> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>> 
>> -------
>>  
>> 
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>> 
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>> 
>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>> 
>>> 
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>> 
>>> -----
>>> 
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>> 
>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>> 
>>>> 
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>> 
>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>> 
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>> 
>>>> Thanks
>>>> ------
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>> 
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>> 
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>> 
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>> 
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>> 
>>>>> Balancer cant help you.
>>>>> 
>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>> 
>>>>>  
>>>>> 
>>>>> 
>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>> 
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>> 
>>>>>> 
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>> 
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>> 
>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>> 
>>>>> 
>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>> 
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>> 
>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>> 
>>>>> -- 
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>> 
>> 
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
On Mon, Mar 25, 2013 at 1:46 AM, Alexey Babutin
<zo...@gmail.com>wrote:

>
>
> On Mon, Mar 25, 2013 at 12:48 AM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>>
>> On Mar 24, 2013, at 3:40 PM, Alexey Babutin <zo...@gmail.com>
>> wrote:
>>
>> you said that threshold=10.Run mannualy command : hadoop balancer
>> threshold 9.5 ,then 9 and so with 0.5 step.
>>
>>
>> We are not setting threshold anywhere in our configuration and thus
>> considering the default which I believe is 10.
>> Why do you suggest such steps need to be tested for balancer ? Please
>> explain.
>> I guess we had a discussion earlier on this thread and came to the
>> conclusion that the threshold will not help in this situation.
>>
>
>
> today i thought about my advice for you and i have understood that i wrong.
>
> for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all
> node have 10 Tb data.
> averege cluster dfs used 1000/2600*100=38.5
>
> for  12Tb node dfs used it is 83.3 from capacity
> for 72Tb nodes its 13.9.
>
> node is balanced if      averege cluster dfs used +threshold > node dfs
> used >averege cluster dfs used - threshold.
> data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of
> capacity balancer will stop.
> In this time 72tb node have 36.1 % of capacity.
>
> the cluster will grow up,in ideal case when cluster dfs used capacity 90 %
> .72Tb nodes will about 80% of capacity and 12Tb have  about 100 %
> capacity.After that you have about 288Tb freespace
>

if threshold=0.1 all nodes will use about 90% of capacity



>
>
>
>
>
>
>
>
>>
>>
>> -----
>>
>>
>>
>>
>> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

Thanks for the explanation. Where can I find the java code for balancer that utilizes the threshold value and calculate it myself as you mentioned ? I think I understand your calculation, but would like to see the code. 
If I set the threshold to 5 instead of 10, then the smaller nodes will have a maximum of 95% full where the larger nodes disk-usage will increase from 80% to 85%.

Now my question to you and the experts is when I run the balancer, is the following command enough to set the threshold to a different value :

hadoop balancer -threshold 5
 
Thanks to all for the suggestions...

-------


> 
> today i thought about my advice for you and i have understood that i wrong.
> 
> for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all node have 10 Tb data.
> averege cluster dfs used 1000/2600*100=38.5
> 
> for  12Tb node dfs used it is 83.3 from capacity
> for 72Tb nodes its 13.9.
> 
> node is balanced if      averege cluster dfs used +threshold > node dfs used >averege cluster dfs used - threshold.
> data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of capacity balancer will stop.
> In this time 72tb node have 36.1 % of capacity.
> 
> the cluster will grow up,in ideal case when cluster dfs used capacity 90 % .72Tb nodes will about 80% of capacity and 12Tb have  about 100 % capacity.After that you have about 288Tb freespace



> 
> 
> 
> 
> 
> 
>  
> 
> 
> -----
> 
> 
> 
> 
>> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>> 
>> -------
>>  
>> 
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>> 
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>> 
>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>> 
>>> 
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>> 
>>> -----
>>> 
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>> 
>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>> 
>>>> 
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>> 
>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>> 
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>> 
>>>> Thanks
>>>> ------
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>> 
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>> 
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>> 
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>> 
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>> 
>>>>> Balancer cant help you.
>>>>> 
>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>> 
>>>>>  
>>>>> 
>>>>> 
>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>> 
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>> 
>>>>>> 
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>> 
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>> 
>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>> 
>>>>> 
>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>> 
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>> 
>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>> 
>>>>> -- 
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>> 
>> 
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
On Mon, Mar 25, 2013 at 1:46 AM, Alexey Babutin
<zo...@gmail.com>wrote:

>
>
> On Mon, Mar 25, 2013 at 12:48 AM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>>
>> On Mar 24, 2013, at 3:40 PM, Alexey Babutin <zo...@gmail.com>
>> wrote:
>>
>> you said that threshold=10.Run mannualy command : hadoop balancer
>> threshold 9.5 ,then 9 and so with 0.5 step.
>>
>>
>> We are not setting threshold anywhere in our configuration and thus
>> considering the default which I believe is 10.
>> Why do you suggest such steps need to be tested for balancer ? Please
>> explain.
>> I guess we had a discussion earlier on this thread and came to the
>> conclusion that the threshold will not help in this situation.
>>
>
>
> today i thought about my advice for you and i have understood that i wrong.
>
> for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all
> node have 10 Tb data.
> averege cluster dfs used 1000/2600*100=38.5
>
> for  12Tb node dfs used it is 83.3 from capacity
> for 72Tb nodes its 13.9.
>
> node is balanced if      averege cluster dfs used +threshold > node dfs
> used >averege cluster dfs used - threshold.
> data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of
> capacity balancer will stop.
> In this time 72tb node have 36.1 % of capacity.
>
> the cluster will grow up,in ideal case when cluster dfs used capacity 90 %
> .72Tb nodes will about 80% of capacity and 12Tb have  about 100 %
> capacity.After that you have about 288Tb freespace
>

if threshold=0.1 all nodes will use about 90% of capacity



>
>
>
>
>
>
>
>
>>
>>
>> -----
>>
>>
>>
>>
>> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
On Mon, Mar 25, 2013 at 1:46 AM, Alexey Babutin
<zo...@gmail.com>wrote:

>
>
> On Mon, Mar 25, 2013 at 12:48 AM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>>
>> On Mar 24, 2013, at 3:40 PM, Alexey Babutin <zo...@gmail.com>
>> wrote:
>>
>> you said that threshold=10.Run mannualy command : hadoop balancer
>> threshold 9.5 ,then 9 and so with 0.5 step.
>>
>>
>> We are not setting threshold anywhere in our configuration and thus
>> considering the default which I believe is 10.
>> Why do you suggest such steps need to be tested for balancer ? Please
>> explain.
>> I guess we had a discussion earlier on this thread and came to the
>> conclusion that the threshold will not help in this situation.
>>
>
>
> today i thought about my advice for you and i have understood that i wrong.
>
> for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all
> node have 10 Tb data.
> averege cluster dfs used 1000/2600*100=38.5
>
> for  12Tb node dfs used it is 83.3 from capacity
> for 72Tb nodes its 13.9.
>
> node is balanced if      averege cluster dfs used +threshold > node dfs
> used >averege cluster dfs used - threshold.
> data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of
> capacity balancer will stop.
> In this time 72tb node have 36.1 % of capacity.
>
> the cluster will grow up,in ideal case when cluster dfs used capacity 90 %
> .72Tb nodes will about 80% of capacity and 12Tb have  about 100 %
> capacity.After that you have about 288Tb freespace
>

if threshold=0.1 all nodes will use about 90% of capacity



>
>
>
>
>
>
>
>
>>
>>
>> -----
>>
>>
>>
>>
>> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
On Mon, Mar 25, 2013 at 1:46 AM, Alexey Babutin
<zo...@gmail.com>wrote:

>
>
> On Mon, Mar 25, 2013 at 12:48 AM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>>
>> On Mar 24, 2013, at 3:40 PM, Alexey Babutin <zo...@gmail.com>
>> wrote:
>>
>> you said that threshold=10.Run mannualy command : hadoop balancer
>> threshold 9.5 ,then 9 and so with 0.5 step.
>>
>>
>> We are not setting threshold anywhere in our configuration and thus
>> considering the default which I believe is 10.
>> Why do you suggest such steps need to be tested for balancer ? Please
>> explain.
>> I guess we had a discussion earlier on this thread and came to the
>> conclusion that the threshold will not help in this situation.
>>
>
>
> today i thought about my advice for you and i have understood that i wrong.
>
> for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all
> node have 10 Tb data.
> averege cluster dfs used 1000/2600*100=38.5
>
> for  12Tb node dfs used it is 83.3 from capacity
> for 72Tb nodes its 13.9.
>
> node is balanced if      averege cluster dfs used +threshold > node dfs
> used >averege cluster dfs used - threshold.
> data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of
> capacity balancer will stop.
> In this time 72tb node have 36.1 % of capacity.
>
> the cluster will grow up,in ideal case when cluster dfs used capacity 90 %
> .72Tb nodes will about 80% of capacity and 12Tb have  about 100 %
> capacity.After that you have about 288Tb freespace
>

if threshold=0.1 all nodes will use about 90% of capacity



>
>
>
>
>
>
>
>
>>
>>
>> -----
>>
>>
>>
>>
>> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

Thanks for the explanation. Where can I find the java code for balancer that utilizes the threshold value and calculate it myself as you mentioned ? I think I understand your calculation, but would like to see the code. 
If I set the threshold to 5 instead of 10, then the smaller nodes will have a maximum of 95% full where the larger nodes disk-usage will increase from 80% to 85%.

Now my question to you and the experts is when I run the balancer, is the following command enough to set the threshold to a different value :

hadoop balancer -threshold 5
 
Thanks to all for the suggestions...

-------


> 
> today i thought about my advice for you and i have understood that i wrong.
> 
> for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all node have 10 Tb data.
> averege cluster dfs used 1000/2600*100=38.5
> 
> for  12Tb node dfs used it is 83.3 from capacity
> for 72Tb nodes its 13.9.
> 
> node is balanced if      averege cluster dfs used +threshold > node dfs used >averege cluster dfs used - threshold.
> data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of capacity balancer will stop.
> In this time 72tb node have 36.1 % of capacity.
> 
> the cluster will grow up,in ideal case when cluster dfs used capacity 90 % .72Tb nodes will about 80% of capacity and 12Tb have  about 100 % capacity.After that you have about 288Tb freespace



> 
> 
> 
> 
> 
> 
>  
> 
> 
> -----
> 
> 
> 
> 
>> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>> 
>> -------
>>  
>> 
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>> 
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>> 
>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>> 
>>> 
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>> 
>>> -----
>>> 
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>> 
>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>> 
>>>> 
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>> 
>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>> 
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>> 
>>>> Thanks
>>>> ------
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>> 
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>> 
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>> 
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>> 
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>> 
>>>>> Balancer cant help you.
>>>>> 
>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>> 
>>>>>  
>>>>> 
>>>>> 
>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>> 
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>> 
>>>>>> 
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>> 
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>> 
>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>> 
>>>>> 
>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>> 
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>> 
>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>> 
>>>>> -- 
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>> 
>> 
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

Thanks for the explanation. Where can I find the java code for balancer that utilizes the threshold value and calculate it myself as you mentioned ? I think I understand your calculation, but would like to see the code. 
If I set the threshold to 5 instead of 10, then the smaller nodes will have a maximum of 95% full where the larger nodes disk-usage will increase from 80% to 85%.

Now my question to you and the experts is when I run the balancer, is the following command enough to set the threshold to a different value :

hadoop balancer -threshold 5
 
Thanks to all for the suggestions...

-------


> 
> today i thought about my advice for you and i have understood that i wrong.
> 
> for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all node have 10 Tb data.
> averege cluster dfs used 1000/2600*100=38.5
> 
> for  12Tb node dfs used it is 83.3 from capacity
> for 72Tb nodes its 13.9.
> 
> node is balanced if      averege cluster dfs used +threshold > node dfs used >averege cluster dfs used - threshold.
> data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of capacity balancer will stop.
> In this time 72tb node have 36.1 % of capacity.
> 
> the cluster will grow up,in ideal case when cluster dfs used capacity 90 % .72Tb nodes will about 80% of capacity and 12Tb have  about 100 % capacity.After that you have about 288Tb freespace



> 
> 
> 
> 
> 
> 
>  
> 
> 
> -----
> 
> 
> 
> 
>> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>> 
>> -------
>>  
>> 
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>> 
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>> 
>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>> 
>>> 
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>> 
>>> -----
>>> 
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>> 
>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>> 
>>>> 
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>> 
>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>> 
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>> 
>>>> Thanks
>>>> ------
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>> 
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>> 
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>> 
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>> 
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>> 
>>>>> Balancer cant help you.
>>>>> 
>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>> 
>>>>>  
>>>>> 
>>>>> 
>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>> 
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>> 
>>>>>> 
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>> 
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>> 
>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>> 
>>>>> 
>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>> 
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>> 
>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>> 
>>>>> -- 
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>> 
>> 
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
On Mon, Mar 25, 2013 at 12:48 AM, Tapas Sarangi <ta...@gmail.com>wrote:

>
> On Mar 24, 2013, at 3:40 PM, Alexey Babutin <zo...@gmail.com>
> wrote:
>
> you said that threshold=10.Run mannualy command : hadoop balancer
> threshold 9.5 ,then 9 and so with 0.5 step.
>
>
> We are not setting threshold anywhere in our configuration and thus
> considering the default which I believe is 10.
> Why do you suggest such steps need to be tested for balancer ? Please
> explain.
> I guess we had a discussion earlier on this thread and came to the
> conclusion that the threshold will not help in this situation.
>


today i thought about my advice for you and i have understood that i wrong.

for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all node
have 10 Tb data.
averege cluster dfs used 1000/2600*100=38.5

for  12Tb node dfs used it is 83.3 from capacity
for 72Tb nodes its 13.9.

node is balanced if      averege cluster dfs used +threshold > node dfs
used >averege cluster dfs used - threshold.
data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of
capacity balancer will stop.
In this time 72tb node have 36.1 % of capacity.

the cluster will grow up,in ideal case when cluster dfs used capacity 90 %
.72Tb nodes will about 80% of capacity and 12Tb have  about 100 %
capacity.After that you have about 288Tb freespace








>
>
> -----
>
>
>
>
> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Yes, thanks for pointing, but I already know that it is completing the
>> balancing when exiting otherwise it shouldn't exit.
>> Your answer doesn't solve the problem I mentioned earlier in my message.
>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>> from the cluster even though "df" shows the cluster has about 500 TB of
>> free space.
>>
>> -------
>>
>>
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> balaji@balajin.net> wrote:
>>
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>
>> So the value is bytes per second. If it is running and exiting,it means
>> it has completed the balancing.
>>
>>
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Yes, we are running balancer, though a balancer process runs for almost
>>> a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>> is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>
>>> -----
>>>
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> lists@balajin.net> wrote:
>>>
>>> Are you running balancer? If balancer is running and if it is slow, try
>>> increasing the balancer bandwidth
>>>
>>>
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>> of all live nodes.
>>>>
>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>> nodes have a lot of unused space.
>>>>
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>> where it is not able to write any further even though the total space
>>>> available in the cluster is about 500 TB. We believe this has something to
>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>> going wrong here...
>>>>
>>>> Thanks
>>>> ------
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Balancer know about topology,but when calculate balancing it operates
>>>> only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>> string 509.
>>>>
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>>
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>> capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>> .Balancer think that all good if  avgutil
>>>> +10>node_utilizazation>=avgutil-10.
>>>>
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>
>>>> Balancer cant help you.
>>>>
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>>> you will be able to have only 12Tb replication data.
>>>>>
>>>>>
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>
>>>>>
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>> be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>>
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>>
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>
>>>>>
>>>>> The same question I asked earlier in this message, does multiple racks
>>>>> with default threshold for the balancer minimizes the difference between
>>>>> racks ?
>>>>>
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>> choise.
>>>>>
>>>>>
>>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>>> this cluster and trying to understand few issues. I will explore other
>>>>> options as you mentioned.
>>>>>
>>>>> --
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>>>
>>>>
>>>
>>
>>
>> --
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
On Mon, Mar 25, 2013 at 12:48 AM, Tapas Sarangi <ta...@gmail.com>wrote:

>
> On Mar 24, 2013, at 3:40 PM, Alexey Babutin <zo...@gmail.com>
> wrote:
>
> you said that threshold=10.Run mannualy command : hadoop balancer
> threshold 9.5 ,then 9 and so with 0.5 step.
>
>
> We are not setting threshold anywhere in our configuration and thus
> considering the default which I believe is 10.
> Why do you suggest such steps need to be tested for balancer ? Please
> explain.
> I guess we had a discussion earlier on this thread and came to the
> conclusion that the threshold will not help in this situation.
>


today i thought about my advice for you and i have understood that i wrong.

for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all node
have 10 Tb data.
averege cluster dfs used 1000/2600*100=38.5

for  12Tb node dfs used it is 83.3 from capacity
for 72Tb nodes its 13.9.

node is balanced if      averege cluster dfs used +threshold > node dfs
used >averege cluster dfs used - threshold.
data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of
capacity balancer will stop.
In this time 72tb node have 36.1 % of capacity.

the cluster will grow up,in ideal case when cluster dfs used capacity 90 %
.72Tb nodes will about 80% of capacity and 12Tb have  about 100 %
capacity.After that you have about 288Tb freespace








>
>
> -----
>
>
>
>
> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Yes, thanks for pointing, but I already know that it is completing the
>> balancing when exiting otherwise it shouldn't exit.
>> Your answer doesn't solve the problem I mentioned earlier in my message.
>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>> from the cluster even though "df" shows the cluster has about 500 TB of
>> free space.
>>
>> -------
>>
>>
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> balaji@balajin.net> wrote:
>>
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>
>> So the value is bytes per second. If it is running and exiting,it means
>> it has completed the balancing.
>>
>>
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Yes, we are running balancer, though a balancer process runs for almost
>>> a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>> is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>
>>> -----
>>>
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> lists@balajin.net> wrote:
>>>
>>> Are you running balancer? If balancer is running and if it is slow, try
>>> increasing the balancer bandwidth
>>>
>>>
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>> of all live nodes.
>>>>
>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>> nodes have a lot of unused space.
>>>>
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>> where it is not able to write any further even though the total space
>>>> available in the cluster is about 500 TB. We believe this has something to
>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>> going wrong here...
>>>>
>>>> Thanks
>>>> ------
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Balancer know about topology,but when calculate balancing it operates
>>>> only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>> string 509.
>>>>
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>>
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>> capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>> .Balancer think that all good if  avgutil
>>>> +10>node_utilizazation>=avgutil-10.
>>>>
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>
>>>> Balancer cant help you.
>>>>
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>>> you will be able to have only 12Tb replication data.
>>>>>
>>>>>
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>
>>>>>
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>> be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>>
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>>
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>
>>>>>
>>>>> The same question I asked earlier in this message, does multiple racks
>>>>> with default threshold for the balancer minimizes the difference between
>>>>> racks ?
>>>>>
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>> choise.
>>>>>
>>>>>
>>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>>> this cluster and trying to understand few issues. I will explore other
>>>>> options as you mentioned.
>>>>>
>>>>> --
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>>>
>>>>
>>>
>>
>>
>> --
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
On Mon, Mar 25, 2013 at 12:48 AM, Tapas Sarangi <ta...@gmail.com>wrote:

>
> On Mar 24, 2013, at 3:40 PM, Alexey Babutin <zo...@gmail.com>
> wrote:
>
> you said that threshold=10.Run mannualy command : hadoop balancer
> threshold 9.5 ,then 9 and so with 0.5 step.
>
>
> We are not setting threshold anywhere in our configuration and thus
> considering the default which I believe is 10.
> Why do you suggest such steps need to be tested for balancer ? Please
> explain.
> I guess we had a discussion earlier on this thread and came to the
> conclusion that the threshold will not help in this situation.
>


today i thought about my advice for you and i have understood that i wrong.

for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all node
have 10 Tb data.
averege cluster dfs used 1000/2600*100=38.5

for  12Tb node dfs used it is 83.3 from capacity
for 72Tb nodes its 13.9.

node is balanced if      averege cluster dfs used +threshold > node dfs
used >averege cluster dfs used - threshold.
data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of
capacity balancer will stop.
In this time 72tb node have 36.1 % of capacity.

the cluster will grow up,in ideal case when cluster dfs used capacity 90 %
.72Tb nodes will about 80% of capacity and 12Tb have  about 100 %
capacity.After that you have about 288Tb freespace








>
>
> -----
>
>
>
>
> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Yes, thanks for pointing, but I already know that it is completing the
>> balancing when exiting otherwise it shouldn't exit.
>> Your answer doesn't solve the problem I mentioned earlier in my message.
>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>> from the cluster even though "df" shows the cluster has about 500 TB of
>> free space.
>>
>> -------
>>
>>
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> balaji@balajin.net> wrote:
>>
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>
>> So the value is bytes per second. If it is running and exiting,it means
>> it has completed the balancing.
>>
>>
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Yes, we are running balancer, though a balancer process runs for almost
>>> a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>> is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>
>>> -----
>>>
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> lists@balajin.net> wrote:
>>>
>>> Are you running balancer? If balancer is running and if it is slow, try
>>> increasing the balancer bandwidth
>>>
>>>
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>> of all live nodes.
>>>>
>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>> nodes have a lot of unused space.
>>>>
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>> where it is not able to write any further even though the total space
>>>> available in the cluster is about 500 TB. We believe this has something to
>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>> going wrong here...
>>>>
>>>> Thanks
>>>> ------
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Balancer know about topology,but when calculate balancing it operates
>>>> only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>> string 509.
>>>>
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>>
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>> capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>> .Balancer think that all good if  avgutil
>>>> +10>node_utilizazation>=avgutil-10.
>>>>
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>
>>>> Balancer cant help you.
>>>>
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>>> you will be able to have only 12Tb replication data.
>>>>>
>>>>>
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>
>>>>>
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>> be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>>
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>>
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>
>>>>>
>>>>> The same question I asked earlier in this message, does multiple racks
>>>>> with default threshold for the balancer minimizes the difference between
>>>>> racks ?
>>>>>
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>> choise.
>>>>>
>>>>>
>>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>>> this cluster and trying to understand few issues. I will explore other
>>>>> options as you mentioned.
>>>>>
>>>>> --
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>>>
>>>>
>>>
>>
>>
>> --
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
On Mon, Mar 25, 2013 at 12:48 AM, Tapas Sarangi <ta...@gmail.com>wrote:

>
> On Mar 24, 2013, at 3:40 PM, Alexey Babutin <zo...@gmail.com>
> wrote:
>
> you said that threshold=10.Run mannualy command : hadoop balancer
> threshold 9.5 ,then 9 and so with 0.5 step.
>
>
> We are not setting threshold anywhere in our configuration and thus
> considering the default which I believe is 10.
> Why do you suggest such steps need to be tested for balancer ? Please
> explain.
> I guess we had a discussion earlier on this thread and came to the
> conclusion that the threshold will not help in this situation.
>


today i thought about my advice for you and i have understood that i wrong.

for example we have 100 nodes where 80 with 12Tb and 20 with 72 Tb.all node
have 10 Tb data.
averege cluster dfs used 1000/2600*100=38.5

for  12Tb node dfs used it is 83.3 from capacity
for 72Tb nodes its 13.9.

node is balanced if      averege cluster dfs used +threshold > node dfs
used >averege cluster dfs used - threshold.
data will move from 12Tb to 72 Tb and when 12Tb nodes will have 48.5 of
capacity balancer will stop.
In this time 72tb node have 36.1 % of capacity.

the cluster will grow up,in ideal case when cluster dfs used capacity 90 %
.72Tb nodes will about 80% of capacity and 12Tb have  about 100 %
capacity.After that you have about 288Tb freespace








>
>
> -----
>
>
>
>
> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Yes, thanks for pointing, but I already know that it is completing the
>> balancing when exiting otherwise it shouldn't exit.
>> Your answer doesn't solve the problem I mentioned earlier in my message.
>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>> from the cluster even though "df" shows the cluster has about 500 TB of
>> free space.
>>
>> -------
>>
>>
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> balaji@balajin.net> wrote:
>>
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>
>> So the value is bytes per second. If it is running and exiting,it means
>> it has completed the balancing.
>>
>>
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Yes, we are running balancer, though a balancer process runs for almost
>>> a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>> is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>
>>> -----
>>>
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> lists@balajin.net> wrote:
>>>
>>> Are you running balancer? If balancer is running and if it is slow, try
>>> increasing the balancer bandwidth
>>>
>>>
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>> of all live nodes.
>>>>
>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>> nodes have a lot of unused space.
>>>>
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>> where it is not able to write any further even though the total space
>>>> available in the cluster is about 500 TB. We believe this has something to
>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>> going wrong here...
>>>>
>>>> Thanks
>>>> ------
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Balancer know about topology,but when calculate balancing it operates
>>>> only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>> string 509.
>>>>
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>>
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>> capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>> .Balancer think that all good if  avgutil
>>>> +10>node_utilizazation>=avgutil-10.
>>>>
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>
>>>> Balancer cant help you.
>>>>
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>>> you will be able to have only 12Tb replication data.
>>>>>
>>>>>
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>
>>>>>
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>> be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>>
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>>
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>
>>>>>
>>>>> The same question I asked earlier in this message, does multiple racks
>>>>> with default threshold for the balancer minimizes the difference between
>>>>> racks ?
>>>>>
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>> choise.
>>>>>
>>>>>
>>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>>> this cluster and trying to understand few issues. I will explore other
>>>>> options as you mentioned.
>>>>>
>>>>> --
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>>>
>>>>
>>>
>>
>>
>> --
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 24, 2013, at 3:40 PM, Alexey Babutin <zo...@gmail.com> wrote:

> you said that threshold=10.Run mannualy command : hadoop balancer threshold 9.5 ,then 9 and so with 0.5 step.
> 

We are not setting threshold anywhere in our configuration and thus considering the default which I believe is 10. 
Why do you suggest such steps need to be tested for balancer ? Please explain.
I guess we had a discussion earlier on this thread and came to the conclusion that the threshold will not help in this situation.


-----




> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
> 
> -------
>  
> 
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
> 
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>> 
>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>> 
>> 
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>> 
>> -----
>> 
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>> 
>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>> 
>>> 
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>> 
>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>> 
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>> 
>>> Thanks
>>> ------
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>> 
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>> 
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>> 
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>> 
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>> 
>>>> Balancer cant help you.
>>>> 
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>> 
>>>>  
>>>> 
>>>> 
>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>> 
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>> 
>>>>> 
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>> 
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>> 
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>> 
>>>> 
>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>> 
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>> 
>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>> 
>>>> -- 
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>> 
>> 
>> 
>> 
>> -- 
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 24, 2013, at 3:40 PM, Alexey Babutin <zo...@gmail.com> wrote:

> you said that threshold=10.Run mannualy command : hadoop balancer threshold 9.5 ,then 9 and so with 0.5 step.
> 

We are not setting threshold anywhere in our configuration and thus considering the default which I believe is 10. 
Why do you suggest such steps need to be tested for balancer ? Please explain.
I guess we had a discussion earlier on this thread and came to the conclusion that the threshold will not help in this situation.


-----




> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
> 
> -------
>  
> 
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
> 
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>> 
>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>> 
>> 
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>> 
>> -----
>> 
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>> 
>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>> 
>>> 
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>> 
>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>> 
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>> 
>>> Thanks
>>> ------
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>> 
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>> 
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>> 
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>> 
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>> 
>>>> Balancer cant help you.
>>>> 
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>> 
>>>>  
>>>> 
>>>> 
>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>> 
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>> 
>>>>> 
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>> 
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>> 
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>> 
>>>> 
>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>> 
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>> 
>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>> 
>>>> -- 
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>> 
>> 
>> 
>> 
>> -- 
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 24, 2013, at 3:40 PM, Alexey Babutin <zo...@gmail.com> wrote:

> you said that threshold=10.Run mannualy command : hadoop balancer threshold 9.5 ,then 9 and so with 0.5 step.
> 

We are not setting threshold anywhere in our configuration and thus considering the default which I believe is 10. 
Why do you suggest such steps need to be tested for balancer ? Please explain.
I guess we had a discussion earlier on this thread and came to the conclusion that the threshold will not help in this situation.


-----




> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
> 
> -------
>  
> 
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
> 
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>> 
>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>> 
>> 
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>> 
>> -----
>> 
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>> 
>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>> 
>>> 
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>> 
>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>> 
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>> 
>>> Thanks
>>> ------
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>> 
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>> 
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>> 
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>> 
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>> 
>>>> Balancer cant help you.
>>>> 
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>> 
>>>>  
>>>> 
>>>> 
>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>> 
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>> 
>>>>> 
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>> 
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>> 
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>> 
>>>> 
>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>> 
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>> 
>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>> 
>>>> -- 
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>> 
>> 
>> 
>> 
>> -- 
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 24, 2013, at 3:40 PM, Alexey Babutin <zo...@gmail.com> wrote:

> you said that threshold=10.Run mannualy command : hadoop balancer threshold 9.5 ,then 9 and so with 0.5 step.
> 

We are not setting threshold anywhere in our configuration and thus considering the default which I believe is 10. 
Why do you suggest such steps need to be tested for balancer ? Please explain.
I guess we had a discussion earlier on this thread and came to the conclusion that the threshold will not help in this situation.


-----




> On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
> 
> -------
>  
> 
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
> 
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>> 
>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>> 
>> 
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>> 
>> -----
>> 
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>> 
>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>> 
>>> 
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>> 
>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>> 
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>> 
>>> Thanks
>>> ------
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>> 
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>> 
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>> 
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>> 
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>> 
>>>> Balancer cant help you.
>>>> 
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>> 
>>>>  
>>>> 
>>>> 
>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>> 
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>> 
>>>>> 
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>> 
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>> 
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>> 
>>>> 
>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>> 
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>> 
>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>> 
>>>> -- 
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>> 
>> 
>> 
>> 
>> -- 
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
you said that threshold=10.Run mannualy command : hadoop balancer threshold
9.5 ,then 9 and so with 0.5 step.

On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Yes, thanks for pointing, but I already know that it is completing the
> balancing when exiting otherwise it shouldn't exit.
> Your answer doesn't solve the problem I mentioned earlier in my message.
> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
> from the cluster even though "df" shows the cluster has about 500 TB of
> free space.
>
> -------
>
>
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
> balaji@balajin.net> wrote:
>
>  -setBalancerBandwidth <bandwidth in bytes per second>
>
> So the value is bytes per second. If it is running and exiting,it means it
> has completed the balancing.
>
>
> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>
>> Yes, we are running balancer, though a balancer process runs for almost a
>> day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>> is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>
>> -----
>>
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> lists@balajin.net> wrote:
>>
>> Are you running balancer? If balancer is running and if it is slow, try
>> increasing the balancer bandwidth
>>
>>
>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Thanks for the follow up. I don't know whether attachment will pass
>>> through this mailing list, but I am attaching a pdf that contains the usage
>>> of all live nodes.
>>>
>>> All nodes starting with letter "g" are the ones with smaller storage
>>> space where as nodes starting with letter "s" have larger storage space. As
>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>> nodes have a lot of unused space.
>>>
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>> where it is not able to write any further even though the total space
>>> available in the cluster is about 500 TB. We believe this has something to
>>> do with the way it is balancing the nodes, but don't understand the problem
>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>> going wrong here...
>>>
>>> Thanks
>>> ------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Balancer know about topology,but when calculate balancing it operates
>>> only with nodes not with racks.
>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>> string 509.
>>>
>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>
>>> For example:
>>> cluster_capacity=3.5Pb
>>> cluster_dfsused=2Pb
>>>
>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>> .Balancer think that all good if  avgutil
>>> +10>node_utilizazation>=avgutil-10.
>>>
>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>
>>> Balancer cant help you.
>>>
>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>
>>>
>>>
>>>>
>>>>
>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>> you will be able to have only 12Tb replication data.
>>>>
>>>>
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>
>>>>
>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>>>> with identical capacity.Racks must be identical capacity.
>>>> For example:
>>>>
>>>> rack1: 1 node with 72Tb
>>>> rack2: 6 nodes with 12Tb
>>>> rack3: 3 nodes with 24Tb
>>>>
>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>
>>>>
>>>> The same question I asked earlier in this message, does multiple racks
>>>> with default threshold for the balancer minimizes the difference between
>>>> racks ?
>>>>
>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>> choise.
>>>>
>>>>
>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>> this cluster and trying to understand few issues. I will explore other
>>>> options as you mentioned.
>>>>
>>>> --
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>>>
>>>
>>
>
>
> --
> http://balajin.net/blog
> http://flic.kr/balajijegan
>
>
>

Re:Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by see1230 <se...@163.com>.
if  the balancer is not  running ,or with a low bandwith and slow reaction, i think  there may have a signatual unsymmetric between datanodes .






At 2013-03-25 04:37:05,"Jamal B" <jm...@gmail.com> wrote:

Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  


Just my 2 cents



On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:

Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.


-Tapas


On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:


On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.



On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:

Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 


-------
 


On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:


 -setBalancerBandwidth <bandwidth in bytes per second>

So the value is bytes per second. If it is running and exiting,it means it has completed the balancing.




On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:

Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
What's the unit for "dfs.balance.bandwidthPerSec" ?


-----


On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:


Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth




On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:

Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.


All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 


Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...


Thanks
------













Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
You can see how it work in Balancer.java in  BalancerDatanode about string 509.

I was wrong about 350Tb,35Tb it calculates in such way :

For example:
cluster_capacity=3.5Pb
cluster_dfsused=2Pb

avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.

Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.

Balancer cant help you.

Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.

 





In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.



Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.



Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
For example:

rack1: 1 node with 72Tb
rack2: 6 nodes with 12Tb
rack3: 3 nodes with 24Tb

It helps with balancing,because dublicated  block must be another rack.




The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?


Why did you select hdfs?May be lustre,cephfs and other is better choise. 



It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.

--
http://balajin.net/blog
http://flic.kr/balajijegan





--
http://balajin.net/blog
http://flic.kr/balajijegan








Re:Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by see1230 <se...@163.com>.
if  the balancer is not  running ,or with a low bandwith and slow reaction, i think  there may have a signatual unsymmetric between datanodes .






At 2013-03-25 04:37:05,"Jamal B" <jm...@gmail.com> wrote:

Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  


Just my 2 cents



On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:

Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.


-Tapas


On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:


On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.



On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:

Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 


-------
 


On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:


 -setBalancerBandwidth <bandwidth in bytes per second>

So the value is bytes per second. If it is running and exiting,it means it has completed the balancing.




On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:

Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
What's the unit for "dfs.balance.bandwidthPerSec" ?


-----


On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:


Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth




On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:

Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.


All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 


Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...


Thanks
------













Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
You can see how it work in Balancer.java in  BalancerDatanode about string 509.

I was wrong about 350Tb,35Tb it calculates in such way :

For example:
cluster_capacity=3.5Pb
cluster_dfsused=2Pb

avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.

Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.

Balancer cant help you.

Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.

 





In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.



Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.



Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
For example:

rack1: 1 node with 72Tb
rack2: 6 nodes with 12Tb
rack3: 3 nodes with 24Tb

It helps with balancing,because dublicated  block must be another rack.




The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?


Why did you select hdfs?May be lustre,cephfs and other is better choise. 



It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.

--
http://balajin.net/blog
http://flic.kr/balajijegan





--
http://balajin.net/blog
http://flic.kr/balajijegan








Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
Yes
On Mar 24, 2013 9:25 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:

> Thanks. Does this need a restart of hadoop in the nodes where this
> modification is made ?
>
> -----
>
> On Mar 24, 2013, at 8:06 PM, Jamal B <jm...@gmail.com> wrote:
>
> dfs.datanode.du.reserved
>
> You could tweak that param on the smaller nodes to "force" the flow of
> blocks to other nodes.   A short term hack at best, but should help the
> situation a bit.
> On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>
>>
>> On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> It shouldn't cause further problems since most of your small nodes are
>> already their capacity.  You could set or increase the dfs reserved
>> property on your smaller nodes to force the flow of blocks onto the larger
>> nodes.
>>
>>
>> Thanks.  Can you please specify which are the dfs properties that we can
>> set or modify to force the flow of blocks directed towards the larger nodes
>> than the smaller nodes ?
>>
>> -----
>>
>>
>>
>>
>>
>>
>> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Thanks for the idea, I will give this a try and report back.
>>>
>>> My worry is if we decommission a small node (one at a time), will it
>>> move the data to larger nodes or choke another smaller nodes ? In principle
>>> it should distribute the blocks, the point is it is not distributing the
>>> way we expect it to, so do you think this may cause further problems ?
>>>
>>> ---------
>>>
>>> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>>>
>>> Then I think the only way around this would be to decommission  1 at a
>>> time, the smaller nodes, and ensure that the blocks are moved to the larger
>>> nodes.
>>>
>>> And once complete, bring back in the smaller nodes, but maybe only after
>>> you tweak the rack topology to match your disk layout more than network
>>> layout to compensate for the unbalanced nodes.
>>>
>>>
>>> Just my 2 cents
>>>
>>>
>>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>>
>>>> Thanks. We have a 1-1 configuration of drives and folder in all the
>>>> datanodes.
>>>>
>>>> -Tapas
>>>>
>>>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>>>
>>>> On both types of nodes, what is your dfs.data.dir set to? Does it
>>>> specify multiple folders on the same set's of drives or is it 1-1 between
>>>> folder and drive?  If it's set to multiple folders on the same drives, it
>>>> is probably multiplying the amount of "available capacity" incorrectly in
>>>> that it assumes a 1-1 relationship between folder and total capacity of the
>>>> drive.
>>>>
>>>>
>>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <tapas.sarangi@gmail.com
>>>> > wrote:
>>>>
>>>>> Yes, thanks for pointing, but I already know that it is completing the
>>>>> balancing when exiting otherwise it shouldn't exit.
>>>>> Your answer doesn't solve the problem I mentioned earlier in my
>>>>> message. 'hdfs' is stalling and hadoop is not writing unless space is
>>>>> cleared up from the cluster even though "df" shows the cluster has about
>>>>> 500 TB of free space.
>>>>>
>>>>> -------
>>>>>
>>>>>
>>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>>> balaji@balajin.net> wrote:
>>>>>
>>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>>
>>>>> So the value is bytes per second. If it is running and exiting,it
>>>>> means it has completed the balancing.
>>>>>
>>>>>
>>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>>
>>>>>> Yes, we are running balancer, though a balancer process runs for
>>>>>> almost a day or more before exiting and starting over.
>>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>>>> is in Bits then we have a problem.
>>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>>>
>>>>>> -----
>>>>>>
>>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>>>> lists@balajin.net> wrote:
>>>>>>
>>>>>> Are you running balancer? If balancer is running and if it is slow,
>>>>>> try increasing the balancer bandwidth
>>>>>>
>>>>>>
>>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com>wrote:
>>>>>>
>>>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>>>> of all live nodes.
>>>>>>>
>>>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>>>> nodes have a lot of unused space.
>>>>>>>
>>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>>>> where it is not able to write any further even though the total space
>>>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>>>> going wrong here...
>>>>>>>
>>>>>>> Thanks
>>>>>>> ------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Balancer know about topology,but when calculate balancing it
>>>>>>> operates only with nodes not with racks.
>>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>>>> string 509.
>>>>>>>
>>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>>
>>>>>>> For example:
>>>>>>> cluster_capacity=3.5Pb
>>>>>>> cluster_dfsused=2Pb
>>>>>>>
>>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>>>> capacity
>>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>>>> .Balancer think that all good if  avgutil
>>>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>>>
>>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node
>>>>>>> its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>>
>>>>>>> Balancer cant help you.
>>>>>>>
>>>>>>> Show me
>>>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>>>> you can.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB
>>>>>>>> and 72 TB, but not true for more than two nodes in the cluster.
>>>>>>>>
>>>>>>>>
>>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack
>>>>>>>> must be with identical capacity.Racks must be identical capacity.
>>>>>>>> For example:
>>>>>>>>
>>>>>>>> rack1: 1 node with 72Tb
>>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>>
>>>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>>>> rack.
>>>>>>>>
>>>>>>>>
>>>>>>>> The same question I asked earlier in this message, does multiple
>>>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>>>> between racks ?
>>>>>>>>
>>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>>>> choise.
>>>>>>>>
>>>>>>>>
>>>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>>>> options as you mentioned.
>>>>>>>>
>>>>>>>> --
>>>>>>>> http://balajin.net/blog
>>>>>>>> http://flic.kr/balajijegan
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
Yes
On Mar 24, 2013 9:25 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:

> Thanks. Does this need a restart of hadoop in the nodes where this
> modification is made ?
>
> -----
>
> On Mar 24, 2013, at 8:06 PM, Jamal B <jm...@gmail.com> wrote:
>
> dfs.datanode.du.reserved
>
> You could tweak that param on the smaller nodes to "force" the flow of
> blocks to other nodes.   A short term hack at best, but should help the
> situation a bit.
> On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>
>>
>> On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> It shouldn't cause further problems since most of your small nodes are
>> already their capacity.  You could set or increase the dfs reserved
>> property on your smaller nodes to force the flow of blocks onto the larger
>> nodes.
>>
>>
>> Thanks.  Can you please specify which are the dfs properties that we can
>> set or modify to force the flow of blocks directed towards the larger nodes
>> than the smaller nodes ?
>>
>> -----
>>
>>
>>
>>
>>
>>
>> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Thanks for the idea, I will give this a try and report back.
>>>
>>> My worry is if we decommission a small node (one at a time), will it
>>> move the data to larger nodes or choke another smaller nodes ? In principle
>>> it should distribute the blocks, the point is it is not distributing the
>>> way we expect it to, so do you think this may cause further problems ?
>>>
>>> ---------
>>>
>>> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>>>
>>> Then I think the only way around this would be to decommission  1 at a
>>> time, the smaller nodes, and ensure that the blocks are moved to the larger
>>> nodes.
>>>
>>> And once complete, bring back in the smaller nodes, but maybe only after
>>> you tweak the rack topology to match your disk layout more than network
>>> layout to compensate for the unbalanced nodes.
>>>
>>>
>>> Just my 2 cents
>>>
>>>
>>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>>
>>>> Thanks. We have a 1-1 configuration of drives and folder in all the
>>>> datanodes.
>>>>
>>>> -Tapas
>>>>
>>>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>>>
>>>> On both types of nodes, what is your dfs.data.dir set to? Does it
>>>> specify multiple folders on the same set's of drives or is it 1-1 between
>>>> folder and drive?  If it's set to multiple folders on the same drives, it
>>>> is probably multiplying the amount of "available capacity" incorrectly in
>>>> that it assumes a 1-1 relationship between folder and total capacity of the
>>>> drive.
>>>>
>>>>
>>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <tapas.sarangi@gmail.com
>>>> > wrote:
>>>>
>>>>> Yes, thanks for pointing, but I already know that it is completing the
>>>>> balancing when exiting otherwise it shouldn't exit.
>>>>> Your answer doesn't solve the problem I mentioned earlier in my
>>>>> message. 'hdfs' is stalling and hadoop is not writing unless space is
>>>>> cleared up from the cluster even though "df" shows the cluster has about
>>>>> 500 TB of free space.
>>>>>
>>>>> -------
>>>>>
>>>>>
>>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>>> balaji@balajin.net> wrote:
>>>>>
>>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>>
>>>>> So the value is bytes per second. If it is running and exiting,it
>>>>> means it has completed the balancing.
>>>>>
>>>>>
>>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>>
>>>>>> Yes, we are running balancer, though a balancer process runs for
>>>>>> almost a day or more before exiting and starting over.
>>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>>>> is in Bits then we have a problem.
>>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>>>
>>>>>> -----
>>>>>>
>>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>>>> lists@balajin.net> wrote:
>>>>>>
>>>>>> Are you running balancer? If balancer is running and if it is slow,
>>>>>> try increasing the balancer bandwidth
>>>>>>
>>>>>>
>>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com>wrote:
>>>>>>
>>>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>>>> of all live nodes.
>>>>>>>
>>>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>>>> nodes have a lot of unused space.
>>>>>>>
>>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>>>> where it is not able to write any further even though the total space
>>>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>>>> going wrong here...
>>>>>>>
>>>>>>> Thanks
>>>>>>> ------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Balancer know about topology,but when calculate balancing it
>>>>>>> operates only with nodes not with racks.
>>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>>>> string 509.
>>>>>>>
>>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>>
>>>>>>> For example:
>>>>>>> cluster_capacity=3.5Pb
>>>>>>> cluster_dfsused=2Pb
>>>>>>>
>>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>>>> capacity
>>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>>>> .Balancer think that all good if  avgutil
>>>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>>>
>>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node
>>>>>>> its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>>
>>>>>>> Balancer cant help you.
>>>>>>>
>>>>>>> Show me
>>>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>>>> you can.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB
>>>>>>>> and 72 TB, but not true for more than two nodes in the cluster.
>>>>>>>>
>>>>>>>>
>>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack
>>>>>>>> must be with identical capacity.Racks must be identical capacity.
>>>>>>>> For example:
>>>>>>>>
>>>>>>>> rack1: 1 node with 72Tb
>>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>>
>>>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>>>> rack.
>>>>>>>>
>>>>>>>>
>>>>>>>> The same question I asked earlier in this message, does multiple
>>>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>>>> between racks ?
>>>>>>>>
>>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>>>> choise.
>>>>>>>>
>>>>>>>>
>>>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>>>> options as you mentioned.
>>>>>>>>
>>>>>>>> --
>>>>>>>> http://balajin.net/blog
>>>>>>>> http://flic.kr/balajijegan
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
Yes
On Mar 24, 2013 9:25 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:

> Thanks. Does this need a restart of hadoop in the nodes where this
> modification is made ?
>
> -----
>
> On Mar 24, 2013, at 8:06 PM, Jamal B <jm...@gmail.com> wrote:
>
> dfs.datanode.du.reserved
>
> You could tweak that param on the smaller nodes to "force" the flow of
> blocks to other nodes.   A short term hack at best, but should help the
> situation a bit.
> On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>
>>
>> On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> It shouldn't cause further problems since most of your small nodes are
>> already their capacity.  You could set or increase the dfs reserved
>> property on your smaller nodes to force the flow of blocks onto the larger
>> nodes.
>>
>>
>> Thanks.  Can you please specify which are the dfs properties that we can
>> set or modify to force the flow of blocks directed towards the larger nodes
>> than the smaller nodes ?
>>
>> -----
>>
>>
>>
>>
>>
>>
>> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Thanks for the idea, I will give this a try and report back.
>>>
>>> My worry is if we decommission a small node (one at a time), will it
>>> move the data to larger nodes or choke another smaller nodes ? In principle
>>> it should distribute the blocks, the point is it is not distributing the
>>> way we expect it to, so do you think this may cause further problems ?
>>>
>>> ---------
>>>
>>> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>>>
>>> Then I think the only way around this would be to decommission  1 at a
>>> time, the smaller nodes, and ensure that the blocks are moved to the larger
>>> nodes.
>>>
>>> And once complete, bring back in the smaller nodes, but maybe only after
>>> you tweak the rack topology to match your disk layout more than network
>>> layout to compensate for the unbalanced nodes.
>>>
>>>
>>> Just my 2 cents
>>>
>>>
>>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>>
>>>> Thanks. We have a 1-1 configuration of drives and folder in all the
>>>> datanodes.
>>>>
>>>> -Tapas
>>>>
>>>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>>>
>>>> On both types of nodes, what is your dfs.data.dir set to? Does it
>>>> specify multiple folders on the same set's of drives or is it 1-1 between
>>>> folder and drive?  If it's set to multiple folders on the same drives, it
>>>> is probably multiplying the amount of "available capacity" incorrectly in
>>>> that it assumes a 1-1 relationship between folder and total capacity of the
>>>> drive.
>>>>
>>>>
>>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <tapas.sarangi@gmail.com
>>>> > wrote:
>>>>
>>>>> Yes, thanks for pointing, but I already know that it is completing the
>>>>> balancing when exiting otherwise it shouldn't exit.
>>>>> Your answer doesn't solve the problem I mentioned earlier in my
>>>>> message. 'hdfs' is stalling and hadoop is not writing unless space is
>>>>> cleared up from the cluster even though "df" shows the cluster has about
>>>>> 500 TB of free space.
>>>>>
>>>>> -------
>>>>>
>>>>>
>>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>>> balaji@balajin.net> wrote:
>>>>>
>>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>>
>>>>> So the value is bytes per second. If it is running and exiting,it
>>>>> means it has completed the balancing.
>>>>>
>>>>>
>>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>>
>>>>>> Yes, we are running balancer, though a balancer process runs for
>>>>>> almost a day or more before exiting and starting over.
>>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>>>> is in Bits then we have a problem.
>>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>>>
>>>>>> -----
>>>>>>
>>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>>>> lists@balajin.net> wrote:
>>>>>>
>>>>>> Are you running balancer? If balancer is running and if it is slow,
>>>>>> try increasing the balancer bandwidth
>>>>>>
>>>>>>
>>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com>wrote:
>>>>>>
>>>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>>>> of all live nodes.
>>>>>>>
>>>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>>>> nodes have a lot of unused space.
>>>>>>>
>>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>>>> where it is not able to write any further even though the total space
>>>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>>>> going wrong here...
>>>>>>>
>>>>>>> Thanks
>>>>>>> ------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Balancer know about topology,but when calculate balancing it
>>>>>>> operates only with nodes not with racks.
>>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>>>> string 509.
>>>>>>>
>>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>>
>>>>>>> For example:
>>>>>>> cluster_capacity=3.5Pb
>>>>>>> cluster_dfsused=2Pb
>>>>>>>
>>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>>>> capacity
>>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>>>> .Balancer think that all good if  avgutil
>>>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>>>
>>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node
>>>>>>> its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>>
>>>>>>> Balancer cant help you.
>>>>>>>
>>>>>>> Show me
>>>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>>>> you can.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB
>>>>>>>> and 72 TB, but not true for more than two nodes in the cluster.
>>>>>>>>
>>>>>>>>
>>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack
>>>>>>>> must be with identical capacity.Racks must be identical capacity.
>>>>>>>> For example:
>>>>>>>>
>>>>>>>> rack1: 1 node with 72Tb
>>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>>
>>>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>>>> rack.
>>>>>>>>
>>>>>>>>
>>>>>>>> The same question I asked earlier in this message, does multiple
>>>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>>>> between racks ?
>>>>>>>>
>>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>>>> choise.
>>>>>>>>
>>>>>>>>
>>>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>>>> options as you mentioned.
>>>>>>>>
>>>>>>>> --
>>>>>>>> http://balajin.net/blog
>>>>>>>> http://flic.kr/balajijegan
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
Yes
On Mar 24, 2013 9:25 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:

> Thanks. Does this need a restart of hadoop in the nodes where this
> modification is made ?
>
> -----
>
> On Mar 24, 2013, at 8:06 PM, Jamal B <jm...@gmail.com> wrote:
>
> dfs.datanode.du.reserved
>
> You could tweak that param on the smaller nodes to "force" the flow of
> blocks to other nodes.   A short term hack at best, but should help the
> situation a bit.
> On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>
>>
>> On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> It shouldn't cause further problems since most of your small nodes are
>> already their capacity.  You could set or increase the dfs reserved
>> property on your smaller nodes to force the flow of blocks onto the larger
>> nodes.
>>
>>
>> Thanks.  Can you please specify which are the dfs properties that we can
>> set or modify to force the flow of blocks directed towards the larger nodes
>> than the smaller nodes ?
>>
>> -----
>>
>>
>>
>>
>>
>>
>> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Thanks for the idea, I will give this a try and report back.
>>>
>>> My worry is if we decommission a small node (one at a time), will it
>>> move the data to larger nodes or choke another smaller nodes ? In principle
>>> it should distribute the blocks, the point is it is not distributing the
>>> way we expect it to, so do you think this may cause further problems ?
>>>
>>> ---------
>>>
>>> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>>>
>>> Then I think the only way around this would be to decommission  1 at a
>>> time, the smaller nodes, and ensure that the blocks are moved to the larger
>>> nodes.
>>>
>>> And once complete, bring back in the smaller nodes, but maybe only after
>>> you tweak the rack topology to match your disk layout more than network
>>> layout to compensate for the unbalanced nodes.
>>>
>>>
>>> Just my 2 cents
>>>
>>>
>>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>>
>>>> Thanks. We have a 1-1 configuration of drives and folder in all the
>>>> datanodes.
>>>>
>>>> -Tapas
>>>>
>>>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>>>
>>>> On both types of nodes, what is your dfs.data.dir set to? Does it
>>>> specify multiple folders on the same set's of drives or is it 1-1 between
>>>> folder and drive?  If it's set to multiple folders on the same drives, it
>>>> is probably multiplying the amount of "available capacity" incorrectly in
>>>> that it assumes a 1-1 relationship between folder and total capacity of the
>>>> drive.
>>>>
>>>>
>>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <tapas.sarangi@gmail.com
>>>> > wrote:
>>>>
>>>>> Yes, thanks for pointing, but I already know that it is completing the
>>>>> balancing when exiting otherwise it shouldn't exit.
>>>>> Your answer doesn't solve the problem I mentioned earlier in my
>>>>> message. 'hdfs' is stalling and hadoop is not writing unless space is
>>>>> cleared up from the cluster even though "df" shows the cluster has about
>>>>> 500 TB of free space.
>>>>>
>>>>> -------
>>>>>
>>>>>
>>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>>> balaji@balajin.net> wrote:
>>>>>
>>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>>
>>>>> So the value is bytes per second. If it is running and exiting,it
>>>>> means it has completed the balancing.
>>>>>
>>>>>
>>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>>
>>>>>> Yes, we are running balancer, though a balancer process runs for
>>>>>> almost a day or more before exiting and starting over.
>>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>>>> is in Bits then we have a problem.
>>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>>>
>>>>>> -----
>>>>>>
>>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>>>> lists@balajin.net> wrote:
>>>>>>
>>>>>> Are you running balancer? If balancer is running and if it is slow,
>>>>>> try increasing the balancer bandwidth
>>>>>>
>>>>>>
>>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com>wrote:
>>>>>>
>>>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>>>> of all live nodes.
>>>>>>>
>>>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>>>> nodes have a lot of unused space.
>>>>>>>
>>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>>>> where it is not able to write any further even though the total space
>>>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>>>> going wrong here...
>>>>>>>
>>>>>>> Thanks
>>>>>>> ------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Balancer know about topology,but when calculate balancing it
>>>>>>> operates only with nodes not with racks.
>>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>>>> string 509.
>>>>>>>
>>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>>
>>>>>>> For example:
>>>>>>> cluster_capacity=3.5Pb
>>>>>>> cluster_dfsused=2Pb
>>>>>>>
>>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>>>> capacity
>>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>>>> .Balancer think that all good if  avgutil
>>>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>>>
>>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node
>>>>>>> its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>>
>>>>>>> Balancer cant help you.
>>>>>>>
>>>>>>> Show me
>>>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>>>> you can.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>>>
>>>>>>>>
>>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB
>>>>>>>> and 72 TB, but not true for more than two nodes in the cluster.
>>>>>>>>
>>>>>>>>
>>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack
>>>>>>>> must be with identical capacity.Racks must be identical capacity.
>>>>>>>> For example:
>>>>>>>>
>>>>>>>> rack1: 1 node with 72Tb
>>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>>
>>>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>>>> rack.
>>>>>>>>
>>>>>>>>
>>>>>>>> The same question I asked earlier in this message, does multiple
>>>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>>>> between racks ?
>>>>>>>>
>>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>>>> choise.
>>>>>>>>
>>>>>>>>
>>>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>>>> options as you mentioned.
>>>>>>>>
>>>>>>>> --
>>>>>>>> http://balajin.net/blog
>>>>>>>> http://flic.kr/balajijegan
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks. Does this need a restart of hadoop in the nodes where this modification is made ?

-----

On Mar 24, 2013, at 8:06 PM, Jamal B <jm...@gmail.com> wrote:

> dfs.datanode.du.reserved
> 
> You could tweak that param on the smaller nodes to "force" the flow of blocks to other nodes.   A short term hack at best, but should help the situation a bit.
> 
> On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
> 
> On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:
> 
>> It shouldn't cause further problems since most of your small nodes are already their capacity.  You could set or increase the dfs reserved property on your smaller nodes to force the flow of blocks onto the larger nodes.
>> 
>> 
> 
> Thanks.  Can you please specify which are the dfs properties that we can set or modify to force the flow of blocks directed towards the larger nodes than the smaller nodes ?
> 
> -----
> 
> 
> 
>> 
> 
> 
>> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>> Hi,
>> 
>> Thanks for the idea, I will give this a try and report back. 
>> 
>> My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ?
>> 
>> ---------
>> 
>> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>> 
>>> Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  
>>> And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  
>>> 
>>> Just my 2 cents
>>> 
>>> 
>>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.
>>> 
>>> -Tapas
>>> 
>>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>> 
>>>> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
>>>> 
>>>> 
>>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>>>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>>>> 
>>>> -------
>>>>  
>>>> 
>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>>>> 
>>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>> 
>>>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>>>> 
>>>>> 
>>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>> 
>>>>> -----
>>>>> 
>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>>>> 
>>>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>>>> 
>>>>>> 
>>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>>>> 
>>>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>>>> 
>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>>>> 
>>>>>> Thanks
>>>>>> ------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>>>> 
>>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>> 
>>>>>>> For example:
>>>>>>> cluster_capacity=3.5Pb
>>>>>>> cluster_dfsused=2Pb
>>>>>>> 
>>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>>>> 
>>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>> 
>>>>>>> Balancer cant help you.
>>>>>>> 
>>>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>>>> 
>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>>>> For example:
>>>>>>>> 
>>>>>>>> rack1: 1 node with 72Tb
>>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>> 
>>>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>>>> 
>>>>>>> 
>>>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>>>> 
>>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>>>> 
>>>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>>>> 
>>>>>>> -- 
>>>>>>> http://balajin.net/blog
>>>>>>> http://flic.kr/balajijegan
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>> 
>>>> 
>>> 
>>> 
>> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks. Does this need a restart of hadoop in the nodes where this modification is made ?

-----

On Mar 24, 2013, at 8:06 PM, Jamal B <jm...@gmail.com> wrote:

> dfs.datanode.du.reserved
> 
> You could tweak that param on the smaller nodes to "force" the flow of blocks to other nodes.   A short term hack at best, but should help the situation a bit.
> 
> On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
> 
> On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:
> 
>> It shouldn't cause further problems since most of your small nodes are already their capacity.  You could set or increase the dfs reserved property on your smaller nodes to force the flow of blocks onto the larger nodes.
>> 
>> 
> 
> Thanks.  Can you please specify which are the dfs properties that we can set or modify to force the flow of blocks directed towards the larger nodes than the smaller nodes ?
> 
> -----
> 
> 
> 
>> 
> 
> 
>> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>> Hi,
>> 
>> Thanks for the idea, I will give this a try and report back. 
>> 
>> My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ?
>> 
>> ---------
>> 
>> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>> 
>>> Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  
>>> And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  
>>> 
>>> Just my 2 cents
>>> 
>>> 
>>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.
>>> 
>>> -Tapas
>>> 
>>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>> 
>>>> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
>>>> 
>>>> 
>>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>>>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>>>> 
>>>> -------
>>>>  
>>>> 
>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>>>> 
>>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>> 
>>>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>>>> 
>>>>> 
>>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>> 
>>>>> -----
>>>>> 
>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>>>> 
>>>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>>>> 
>>>>>> 
>>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>>>> 
>>>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>>>> 
>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>>>> 
>>>>>> Thanks
>>>>>> ------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>>>> 
>>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>> 
>>>>>>> For example:
>>>>>>> cluster_capacity=3.5Pb
>>>>>>> cluster_dfsused=2Pb
>>>>>>> 
>>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>>>> 
>>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>> 
>>>>>>> Balancer cant help you.
>>>>>>> 
>>>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>>>> 
>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>>>> For example:
>>>>>>>> 
>>>>>>>> rack1: 1 node with 72Tb
>>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>> 
>>>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>>>> 
>>>>>>> 
>>>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>>>> 
>>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>>>> 
>>>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>>>> 
>>>>>>> -- 
>>>>>>> http://balajin.net/blog
>>>>>>> http://flic.kr/balajijegan
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>> 
>>>> 
>>> 
>>> 
>> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks. Does this need a restart of hadoop in the nodes where this modification is made ?

-----

On Mar 24, 2013, at 8:06 PM, Jamal B <jm...@gmail.com> wrote:

> dfs.datanode.du.reserved
> 
> You could tweak that param on the smaller nodes to "force" the flow of blocks to other nodes.   A short term hack at best, but should help the situation a bit.
> 
> On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
> 
> On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:
> 
>> It shouldn't cause further problems since most of your small nodes are already their capacity.  You could set or increase the dfs reserved property on your smaller nodes to force the flow of blocks onto the larger nodes.
>> 
>> 
> 
> Thanks.  Can you please specify which are the dfs properties that we can set or modify to force the flow of blocks directed towards the larger nodes than the smaller nodes ?
> 
> -----
> 
> 
> 
>> 
> 
> 
>> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>> Hi,
>> 
>> Thanks for the idea, I will give this a try and report back. 
>> 
>> My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ?
>> 
>> ---------
>> 
>> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>> 
>>> Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  
>>> And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  
>>> 
>>> Just my 2 cents
>>> 
>>> 
>>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.
>>> 
>>> -Tapas
>>> 
>>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>> 
>>>> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
>>>> 
>>>> 
>>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>>>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>>>> 
>>>> -------
>>>>  
>>>> 
>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>>>> 
>>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>> 
>>>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>>>> 
>>>>> 
>>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>> 
>>>>> -----
>>>>> 
>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>>>> 
>>>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>>>> 
>>>>>> 
>>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>>>> 
>>>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>>>> 
>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>>>> 
>>>>>> Thanks
>>>>>> ------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>>>> 
>>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>> 
>>>>>>> For example:
>>>>>>> cluster_capacity=3.5Pb
>>>>>>> cluster_dfsused=2Pb
>>>>>>> 
>>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>>>> 
>>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>> 
>>>>>>> Balancer cant help you.
>>>>>>> 
>>>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>>>> 
>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>>>> For example:
>>>>>>>> 
>>>>>>>> rack1: 1 node with 72Tb
>>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>> 
>>>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>>>> 
>>>>>>> 
>>>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>>>> 
>>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>>>> 
>>>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>>>> 
>>>>>>> -- 
>>>>>>> http://balajin.net/blog
>>>>>>> http://flic.kr/balajijegan
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>> 
>>>> 
>>> 
>>> 
>> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks. Does this need a restart of hadoop in the nodes where this modification is made ?

-----

On Mar 24, 2013, at 8:06 PM, Jamal B <jm...@gmail.com> wrote:

> dfs.datanode.du.reserved
> 
> You could tweak that param on the smaller nodes to "force" the flow of blocks to other nodes.   A short term hack at best, but should help the situation a bit.
> 
> On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
> 
> On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:
> 
>> It shouldn't cause further problems since most of your small nodes are already their capacity.  You could set or increase the dfs reserved property on your smaller nodes to force the flow of blocks onto the larger nodes.
>> 
>> 
> 
> Thanks.  Can you please specify which are the dfs properties that we can set or modify to force the flow of blocks directed towards the larger nodes than the smaller nodes ?
> 
> -----
> 
> 
> 
>> 
> 
> 
>> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>> Hi,
>> 
>> Thanks for the idea, I will give this a try and report back. 
>> 
>> My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ?
>> 
>> ---------
>> 
>> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>> 
>>> Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  
>>> And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  
>>> 
>>> Just my 2 cents
>>> 
>>> 
>>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.
>>> 
>>> -Tapas
>>> 
>>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>> 
>>>> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
>>>> 
>>>> 
>>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>>>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>>>> 
>>>> -------
>>>>  
>>>> 
>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>>>> 
>>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>> 
>>>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>>>> 
>>>>> 
>>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>> 
>>>>> -----
>>>>> 
>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>>>> 
>>>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>>>> 
>>>>>> 
>>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>>>> 
>>>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>>>> 
>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>>>> 
>>>>>> Thanks
>>>>>> ------
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>>>> 
>>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>> 
>>>>>>> For example:
>>>>>>> cluster_capacity=3.5Pb
>>>>>>> cluster_dfsused=2Pb
>>>>>>> 
>>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>>>> 
>>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>> 
>>>>>>> Balancer cant help you.
>>>>>>> 
>>>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>>>> 
>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>>>> 
>>>>>>>> 
>>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>>>> For example:
>>>>>>>> 
>>>>>>>> rack1: 1 node with 72Tb
>>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>> 
>>>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>>>> 
>>>>>>> 
>>>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>>>> 
>>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>>>> 
>>>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>>>> 
>>>>>>> -- 
>>>>>>> http://balajin.net/blog
>>>>>>> http://flic.kr/balajijegan
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>> 
>>>> 
>>> 
>>> 
>> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
dfs.datanode.du.reserved

You could tweak that param on the smaller nodes to "force" the flow of
blocks to other nodes.   A short term hack at best, but should help the
situation a bit.
On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:

>
> On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:
>
> It shouldn't cause further problems since most of your small nodes are
> already their capacity.  You could set or increase the dfs reserved
> property on your smaller nodes to force the flow of blocks onto the larger
> nodes.
>
>
> Thanks.  Can you please specify which are the dfs properties that we can
> set or modify to force the flow of blocks directed towards the larger nodes
> than the smaller nodes ?
>
> -----
>
>
>
>
>
>
> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>
>> Hi,
>>
>> Thanks for the idea, I will give this a try and report back.
>>
>> My worry is if we decommission a small node (one at a time), will it move
>> the data to larger nodes or choke another smaller nodes ? In principle it
>> should distribute the blocks, the point is it is not distributing the way
>> we expect it to, so do you think this may cause further problems ?
>>
>> ---------
>>
>> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> Then I think the only way around this would be to decommission  1 at a
>> time, the smaller nodes, and ensure that the blocks are moved to the larger
>> nodes.
>>
>> And once complete, bring back in the smaller nodes, but maybe only after
>> you tweak the rack topology to match your disk layout more than network
>> layout to compensate for the unbalanced nodes.
>>
>>
>> Just my 2 cents
>>
>>
>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Thanks. We have a 1-1 configuration of drives and folder in all the
>>> datanodes.
>>>
>>> -Tapas
>>>
>>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>>
>>> On both types of nodes, what is your dfs.data.dir set to? Does it
>>> specify multiple folders on the same set's of drives or is it 1-1 between
>>> folder and drive?  If it's set to multiple folders on the same drives, it
>>> is probably multiplying the amount of "available capacity" incorrectly in
>>> that it assumes a 1-1 relationship between folder and total capacity of the
>>> drive.
>>>
>>>
>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>>
>>>> Yes, thanks for pointing, but I already know that it is completing the
>>>> balancing when exiting otherwise it shouldn't exit.
>>>> Your answer doesn't solve the problem I mentioned earlier in my
>>>> message. 'hdfs' is stalling and hadoop is not writing unless space is
>>>> cleared up from the cluster even though "df" shows the cluster has about
>>>> 500 TB of free space.
>>>>
>>>> -------
>>>>
>>>>
>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> balaji@balajin.net> wrote:
>>>>
>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>
>>>> So the value is bytes per second. If it is running and exiting,it means
>>>> it has completed the balancing.
>>>>
>>>>
>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Yes, we are running balancer, though a balancer process runs for
>>>>> almost a day or more before exiting and starting over.
>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>>> is in Bits then we have a problem.
>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>>
>>>>> -----
>>>>>
>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>>> lists@balajin.net> wrote:
>>>>>
>>>>> Are you running balancer? If balancer is running and if it is slow,
>>>>> try increasing the balancer bandwidth
>>>>>
>>>>>
>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>>
>>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>>> of all live nodes.
>>>>>>
>>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>>> nodes have a lot of unused space.
>>>>>>
>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>>> where it is not able to write any further even though the total space
>>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>>> going wrong here...
>>>>>>
>>>>>> Thanks
>>>>>> ------
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>>> only with nodes not with racks.
>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>>> string 509.
>>>>>>
>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>
>>>>>> For example:
>>>>>> cluster_capacity=3.5Pb
>>>>>> cluster_dfsused=2Pb
>>>>>>
>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>>> capacity
>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>>> .Balancer think that all good if  avgutil
>>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>>
>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node
>>>>>> its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>
>>>>>> Balancer cant help you.
>>>>>>
>>>>>> Show me
>>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>>> you can.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB
>>>>>>> and 72 TB, but not true for more than two nodes in the cluster.
>>>>>>>
>>>>>>>
>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>>> For example:
>>>>>>>
>>>>>>> rack1: 1 node with 72Tb
>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>
>>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>>> rack.
>>>>>>>
>>>>>>>
>>>>>>> The same question I asked earlier in this message, does multiple
>>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>>> between racks ?
>>>>>>>
>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>>> choise.
>>>>>>>
>>>>>>>
>>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>>> options as you mentioned.
>>>>>>>
>>>>>>> --
>>>>>>> http://balajin.net/blog
>>>>>>> http://flic.kr/balajijegan
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
dfs.datanode.du.reserved

You could tweak that param on the smaller nodes to "force" the flow of
blocks to other nodes.   A short term hack at best, but should help the
situation a bit.
On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:

>
> On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:
>
> It shouldn't cause further problems since most of your small nodes are
> already their capacity.  You could set or increase the dfs reserved
> property on your smaller nodes to force the flow of blocks onto the larger
> nodes.
>
>
> Thanks.  Can you please specify which are the dfs properties that we can
> set or modify to force the flow of blocks directed towards the larger nodes
> than the smaller nodes ?
>
> -----
>
>
>
>
>
>
> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>
>> Hi,
>>
>> Thanks for the idea, I will give this a try and report back.
>>
>> My worry is if we decommission a small node (one at a time), will it move
>> the data to larger nodes or choke another smaller nodes ? In principle it
>> should distribute the blocks, the point is it is not distributing the way
>> we expect it to, so do you think this may cause further problems ?
>>
>> ---------
>>
>> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> Then I think the only way around this would be to decommission  1 at a
>> time, the smaller nodes, and ensure that the blocks are moved to the larger
>> nodes.
>>
>> And once complete, bring back in the smaller nodes, but maybe only after
>> you tweak the rack topology to match your disk layout more than network
>> layout to compensate for the unbalanced nodes.
>>
>>
>> Just my 2 cents
>>
>>
>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Thanks. We have a 1-1 configuration of drives and folder in all the
>>> datanodes.
>>>
>>> -Tapas
>>>
>>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>>
>>> On both types of nodes, what is your dfs.data.dir set to? Does it
>>> specify multiple folders on the same set's of drives or is it 1-1 between
>>> folder and drive?  If it's set to multiple folders on the same drives, it
>>> is probably multiplying the amount of "available capacity" incorrectly in
>>> that it assumes a 1-1 relationship between folder and total capacity of the
>>> drive.
>>>
>>>
>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>>
>>>> Yes, thanks for pointing, but I already know that it is completing the
>>>> balancing when exiting otherwise it shouldn't exit.
>>>> Your answer doesn't solve the problem I mentioned earlier in my
>>>> message. 'hdfs' is stalling and hadoop is not writing unless space is
>>>> cleared up from the cluster even though "df" shows the cluster has about
>>>> 500 TB of free space.
>>>>
>>>> -------
>>>>
>>>>
>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> balaji@balajin.net> wrote:
>>>>
>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>
>>>> So the value is bytes per second. If it is running and exiting,it means
>>>> it has completed the balancing.
>>>>
>>>>
>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Yes, we are running balancer, though a balancer process runs for
>>>>> almost a day or more before exiting and starting over.
>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>>> is in Bits then we have a problem.
>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>>
>>>>> -----
>>>>>
>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>>> lists@balajin.net> wrote:
>>>>>
>>>>> Are you running balancer? If balancer is running and if it is slow,
>>>>> try increasing the balancer bandwidth
>>>>>
>>>>>
>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>>
>>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>>> of all live nodes.
>>>>>>
>>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>>> nodes have a lot of unused space.
>>>>>>
>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>>> where it is not able to write any further even though the total space
>>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>>> going wrong here...
>>>>>>
>>>>>> Thanks
>>>>>> ------
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>>> only with nodes not with racks.
>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>>> string 509.
>>>>>>
>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>
>>>>>> For example:
>>>>>> cluster_capacity=3.5Pb
>>>>>> cluster_dfsused=2Pb
>>>>>>
>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>>> capacity
>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>>> .Balancer think that all good if  avgutil
>>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>>
>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node
>>>>>> its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>
>>>>>> Balancer cant help you.
>>>>>>
>>>>>> Show me
>>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>>> you can.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB
>>>>>>> and 72 TB, but not true for more than two nodes in the cluster.
>>>>>>>
>>>>>>>
>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>>> For example:
>>>>>>>
>>>>>>> rack1: 1 node with 72Tb
>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>
>>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>>> rack.
>>>>>>>
>>>>>>>
>>>>>>> The same question I asked earlier in this message, does multiple
>>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>>> between racks ?
>>>>>>>
>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>>> choise.
>>>>>>>
>>>>>>>
>>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>>> options as you mentioned.
>>>>>>>
>>>>>>> --
>>>>>>> http://balajin.net/blog
>>>>>>> http://flic.kr/balajijegan
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
dfs.datanode.du.reserved

You could tweak that param on the smaller nodes to "force" the flow of
blocks to other nodes.   A short term hack at best, but should help the
situation a bit.
On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:

>
> On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:
>
> It shouldn't cause further problems since most of your small nodes are
> already their capacity.  You could set or increase the dfs reserved
> property on your smaller nodes to force the flow of blocks onto the larger
> nodes.
>
>
> Thanks.  Can you please specify which are the dfs properties that we can
> set or modify to force the flow of blocks directed towards the larger nodes
> than the smaller nodes ?
>
> -----
>
>
>
>
>
>
> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>
>> Hi,
>>
>> Thanks for the idea, I will give this a try and report back.
>>
>> My worry is if we decommission a small node (one at a time), will it move
>> the data to larger nodes or choke another smaller nodes ? In principle it
>> should distribute the blocks, the point is it is not distributing the way
>> we expect it to, so do you think this may cause further problems ?
>>
>> ---------
>>
>> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> Then I think the only way around this would be to decommission  1 at a
>> time, the smaller nodes, and ensure that the blocks are moved to the larger
>> nodes.
>>
>> And once complete, bring back in the smaller nodes, but maybe only after
>> you tweak the rack topology to match your disk layout more than network
>> layout to compensate for the unbalanced nodes.
>>
>>
>> Just my 2 cents
>>
>>
>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Thanks. We have a 1-1 configuration of drives and folder in all the
>>> datanodes.
>>>
>>> -Tapas
>>>
>>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>>
>>> On both types of nodes, what is your dfs.data.dir set to? Does it
>>> specify multiple folders on the same set's of drives or is it 1-1 between
>>> folder and drive?  If it's set to multiple folders on the same drives, it
>>> is probably multiplying the amount of "available capacity" incorrectly in
>>> that it assumes a 1-1 relationship between folder and total capacity of the
>>> drive.
>>>
>>>
>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>>
>>>> Yes, thanks for pointing, but I already know that it is completing the
>>>> balancing when exiting otherwise it shouldn't exit.
>>>> Your answer doesn't solve the problem I mentioned earlier in my
>>>> message. 'hdfs' is stalling and hadoop is not writing unless space is
>>>> cleared up from the cluster even though "df" shows the cluster has about
>>>> 500 TB of free space.
>>>>
>>>> -------
>>>>
>>>>
>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> balaji@balajin.net> wrote:
>>>>
>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>
>>>> So the value is bytes per second. If it is running and exiting,it means
>>>> it has completed the balancing.
>>>>
>>>>
>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Yes, we are running balancer, though a balancer process runs for
>>>>> almost a day or more before exiting and starting over.
>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>>> is in Bits then we have a problem.
>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>>
>>>>> -----
>>>>>
>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>>> lists@balajin.net> wrote:
>>>>>
>>>>> Are you running balancer? If balancer is running and if it is slow,
>>>>> try increasing the balancer bandwidth
>>>>>
>>>>>
>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>>
>>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>>> of all live nodes.
>>>>>>
>>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>>> nodes have a lot of unused space.
>>>>>>
>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>>> where it is not able to write any further even though the total space
>>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>>> going wrong here...
>>>>>>
>>>>>> Thanks
>>>>>> ------
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>>> only with nodes not with racks.
>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>>> string 509.
>>>>>>
>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>
>>>>>> For example:
>>>>>> cluster_capacity=3.5Pb
>>>>>> cluster_dfsused=2Pb
>>>>>>
>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>>> capacity
>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>>> .Balancer think that all good if  avgutil
>>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>>
>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node
>>>>>> its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>
>>>>>> Balancer cant help you.
>>>>>>
>>>>>> Show me
>>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>>> you can.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB
>>>>>>> and 72 TB, but not true for more than two nodes in the cluster.
>>>>>>>
>>>>>>>
>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>>> For example:
>>>>>>>
>>>>>>> rack1: 1 node with 72Tb
>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>
>>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>>> rack.
>>>>>>>
>>>>>>>
>>>>>>> The same question I asked earlier in this message, does multiple
>>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>>> between racks ?
>>>>>>>
>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>>> choise.
>>>>>>>
>>>>>>>
>>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>>> options as you mentioned.
>>>>>>>
>>>>>>> --
>>>>>>> http://balajin.net/blog
>>>>>>> http://flic.kr/balajijegan
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
dfs.datanode.du.reserved

You could tweak that param on the smaller nodes to "force" the flow of
blocks to other nodes.   A short term hack at best, but should help the
situation a bit.
On Mar 24, 2013 7:09 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:

>
> On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:
>
> It shouldn't cause further problems since most of your small nodes are
> already their capacity.  You could set or increase the dfs reserved
> property on your smaller nodes to force the flow of blocks onto the larger
> nodes.
>
>
> Thanks.  Can you please specify which are the dfs properties that we can
> set or modify to force the flow of blocks directed towards the larger nodes
> than the smaller nodes ?
>
> -----
>
>
>
>
>
>
> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
>
>> Hi,
>>
>> Thanks for the idea, I will give this a try and report back.
>>
>> My worry is if we decommission a small node (one at a time), will it move
>> the data to larger nodes or choke another smaller nodes ? In principle it
>> should distribute the blocks, the point is it is not distributing the way
>> we expect it to, so do you think this may cause further problems ?
>>
>> ---------
>>
>> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> Then I think the only way around this would be to decommission  1 at a
>> time, the smaller nodes, and ensure that the blocks are moved to the larger
>> nodes.
>>
>> And once complete, bring back in the smaller nodes, but maybe only after
>> you tweak the rack topology to match your disk layout more than network
>> layout to compensate for the unbalanced nodes.
>>
>>
>> Just my 2 cents
>>
>>
>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Thanks. We have a 1-1 configuration of drives and folder in all the
>>> datanodes.
>>>
>>> -Tapas
>>>
>>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>>
>>> On both types of nodes, what is your dfs.data.dir set to? Does it
>>> specify multiple folders on the same set's of drives or is it 1-1 between
>>> folder and drive?  If it's set to multiple folders on the same drives, it
>>> is probably multiplying the amount of "available capacity" incorrectly in
>>> that it assumes a 1-1 relationship between folder and total capacity of the
>>> drive.
>>>
>>>
>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>>
>>>> Yes, thanks for pointing, but I already know that it is completing the
>>>> balancing when exiting otherwise it shouldn't exit.
>>>> Your answer doesn't solve the problem I mentioned earlier in my
>>>> message. 'hdfs' is stalling and hadoop is not writing unless space is
>>>> cleared up from the cluster even though "df" shows the cluster has about
>>>> 500 TB of free space.
>>>>
>>>> -------
>>>>
>>>>
>>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> balaji@balajin.net> wrote:
>>>>
>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>>
>>>> So the value is bytes per second. If it is running and exiting,it means
>>>> it has completed the balancing.
>>>>
>>>>
>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Yes, we are running balancer, though a balancer process runs for
>>>>> almost a day or more before exiting and starting over.
>>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>>> is in Bits then we have a problem.
>>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>>
>>>>> -----
>>>>>
>>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>>> lists@balajin.net> wrote:
>>>>>
>>>>> Are you running balancer? If balancer is running and if it is slow,
>>>>> try increasing the balancer bandwidth
>>>>>
>>>>>
>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>>
>>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>>> of all live nodes.
>>>>>>
>>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>>> nodes have a lot of unused space.
>>>>>>
>>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>>> where it is not able to write any further even though the total space
>>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>>> going wrong here...
>>>>>>
>>>>>> Thanks
>>>>>> ------
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>>> only with nodes not with racks.
>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>>> string 509.
>>>>>>
>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>>
>>>>>> For example:
>>>>>> cluster_capacity=3.5Pb
>>>>>> cluster_dfsused=2Pb
>>>>>>
>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>>> capacity
>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>>> .Balancer think that all good if  avgutil
>>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>>
>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node
>>>>>> its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>>
>>>>>> Balancer cant help you.
>>>>>>
>>>>>> Show me
>>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>>> you can.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB
>>>>>>> and 72 TB, but not true for more than two nodes in the cluster.
>>>>>>>
>>>>>>>
>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>>> For example:
>>>>>>>
>>>>>>> rack1: 1 node with 72Tb
>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>>
>>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>>> rack.
>>>>>>>
>>>>>>>
>>>>>>> The same question I asked earlier in this message, does multiple
>>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>>> between racks ?
>>>>>>>
>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>>> choise.
>>>>>>>
>>>>>>>
>>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>>> options as you mentioned.
>>>>>>>
>>>>>>> --
>>>>>>> http://balajin.net/blog
>>>>>>> http://flic.kr/balajijegan
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:

> It shouldn't cause further problems since most of your small nodes are already their capacity.  You could set or increase the dfs reserved property on your smaller nodes to force the flow of blocks onto the larger nodes.
> 
> 

Thanks.  Can you please specify which are the dfs properties that we can set or modify to force the flow of blocks directed towards the larger nodes than the smaller nodes ?

-----



> 


> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
> Hi,
> 
> Thanks for the idea, I will give this a try and report back. 
> 
> My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ?
> 
> ---------
> 
> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
> 
>> Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  
>> And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  
>> 
>> Just my 2 cents
>> 
>> 
>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.
>> 
>> -Tapas
>> 
>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>> 
>>> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
>>> 
>>> 
>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>>> 
>>> -------
>>>  
>>> 
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>>> 
>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>> 
>>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>>> 
>>>> 
>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>> 
>>>> -----
>>>> 
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>>> 
>>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>>> 
>>>>> 
>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>>> 
>>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>>> 
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>>> 
>>>>> Thanks
>>>>> ------
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>>> 
>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>> 
>>>>>> For example:
>>>>>> cluster_capacity=3.5Pb
>>>>>> cluster_dfsused=2Pb
>>>>>> 
>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>>> 
>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>> 
>>>>>> Balancer cant help you.
>>>>>> 
>>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>>> 
>>>>>>  
>>>>>> 
>>>>>> 
>>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>>> 
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>>> 
>>>>>>> 
>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>>> For example:
>>>>>>> 
>>>>>>> rack1: 1 node with 72Tb
>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>> 
>>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>>> 
>>>>>> 
>>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>>> 
>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>>> 
>>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>>> 
>>>>>> -- 
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>> 
>>> 
>> 
>> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:

> It shouldn't cause further problems since most of your small nodes are already their capacity.  You could set or increase the dfs reserved property on your smaller nodes to force the flow of blocks onto the larger nodes.
> 
> 

Thanks.  Can you please specify which are the dfs properties that we can set or modify to force the flow of blocks directed towards the larger nodes than the smaller nodes ?

-----



> 


> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
> Hi,
> 
> Thanks for the idea, I will give this a try and report back. 
> 
> My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ?
> 
> ---------
> 
> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
> 
>> Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  
>> And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  
>> 
>> Just my 2 cents
>> 
>> 
>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.
>> 
>> -Tapas
>> 
>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>> 
>>> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
>>> 
>>> 
>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>>> 
>>> -------
>>>  
>>> 
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>>> 
>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>> 
>>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>>> 
>>>> 
>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>> 
>>>> -----
>>>> 
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>>> 
>>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>>> 
>>>>> 
>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>>> 
>>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>>> 
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>>> 
>>>>> Thanks
>>>>> ------
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>>> 
>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>> 
>>>>>> For example:
>>>>>> cluster_capacity=3.5Pb
>>>>>> cluster_dfsused=2Pb
>>>>>> 
>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>>> 
>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>> 
>>>>>> Balancer cant help you.
>>>>>> 
>>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>>> 
>>>>>>  
>>>>>> 
>>>>>> 
>>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>>> 
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>>> 
>>>>>>> 
>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>>> For example:
>>>>>>> 
>>>>>>> rack1: 1 node with 72Tb
>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>> 
>>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>>> 
>>>>>> 
>>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>>> 
>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>>> 
>>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>>> 
>>>>>> -- 
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>> 
>>> 
>> 
>> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:

> It shouldn't cause further problems since most of your small nodes are already their capacity.  You could set or increase the dfs reserved property on your smaller nodes to force the flow of blocks onto the larger nodes.
> 
> 

Thanks.  Can you please specify which are the dfs properties that we can set or modify to force the flow of blocks directed towards the larger nodes than the smaller nodes ?

-----



> 


> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
> Hi,
> 
> Thanks for the idea, I will give this a try and report back. 
> 
> My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ?
> 
> ---------
> 
> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
> 
>> Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  
>> And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  
>> 
>> Just my 2 cents
>> 
>> 
>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.
>> 
>> -Tapas
>> 
>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>> 
>>> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
>>> 
>>> 
>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>>> 
>>> -------
>>>  
>>> 
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>>> 
>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>> 
>>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>>> 
>>>> 
>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>> 
>>>> -----
>>>> 
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>>> 
>>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>>> 
>>>>> 
>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>>> 
>>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>>> 
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>>> 
>>>>> Thanks
>>>>> ------
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>>> 
>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>> 
>>>>>> For example:
>>>>>> cluster_capacity=3.5Pb
>>>>>> cluster_dfsused=2Pb
>>>>>> 
>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>>> 
>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>> 
>>>>>> Balancer cant help you.
>>>>>> 
>>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>>> 
>>>>>>  
>>>>>> 
>>>>>> 
>>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>>> 
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>>> 
>>>>>>> 
>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>>> For example:
>>>>>>> 
>>>>>>> rack1: 1 node with 72Tb
>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>> 
>>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>>> 
>>>>>> 
>>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>>> 
>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>>> 
>>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>>> 
>>>>>> -- 
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>> 
>>> 
>> 
>> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 24, 2013, at 4:34 PM, Jamal B <jm...@gmail.com> wrote:

> It shouldn't cause further problems since most of your small nodes are already their capacity.  You could set or increase the dfs reserved property on your smaller nodes to force the flow of blocks onto the larger nodes.
> 
> 

Thanks.  Can you please specify which are the dfs properties that we can set or modify to force the flow of blocks directed towards the larger nodes than the smaller nodes ?

-----



> 


> On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:
> Hi,
> 
> Thanks for the idea, I will give this a try and report back. 
> 
> My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ?
> 
> ---------
> 
> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
> 
>> Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  
>> And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  
>> 
>> Just my 2 cents
>> 
>> 
>> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.
>> 
>> -Tapas
>> 
>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>> 
>>> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
>>> 
>>> 
>>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>>> 
>>> -------
>>>  
>>> 
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>>> 
>>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>> 
>>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>>> 
>>>> 
>>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>> 
>>>> -----
>>>> 
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>>> 
>>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>>> 
>>>>> 
>>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>>> 
>>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>>> 
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>>> 
>>>>> Thanks
>>>>> ------
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>>> 
>>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>> 
>>>>>> For example:
>>>>>> cluster_capacity=3.5Pb
>>>>>> cluster_dfsused=2Pb
>>>>>> 
>>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>>> 
>>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>>> 
>>>>>> Balancer cant help you.
>>>>>> 
>>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>>> 
>>>>>>  
>>>>>> 
>>>>>> 
>>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>>> 
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>>> 
>>>>>>> 
>>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>>> For example:
>>>>>>> 
>>>>>>> rack1: 1 node with 72Tb
>>>>>>> rack2: 6 nodes with 12Tb
>>>>>>> rack3: 3 nodes with 24Tb
>>>>>>> 
>>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>>> 
>>>>>> 
>>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>>> 
>>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>>> 
>>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>>> 
>>>>>> -- 
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>> 
>>> 
>> 
>> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
It shouldn't cause further problems since most of your small nodes are
already their capacity.  You could set or increase the dfs reserved
property on your smaller nodes to force the flow of blocks onto the larger
nodes.
On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:

> Hi,
>
> Thanks for the idea, I will give this a try and report back.
>
> My worry is if we decommission a small node (one at a time), will it move
> the data to larger nodes or choke another smaller nodes ? In principle it
> should distribute the blocks, the point is it is not distributing the way
> we expect it to, so do you think this may cause further problems ?
>
> ---------
>
> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>
> Then I think the only way around this would be to decommission  1 at a
> time, the smaller nodes, and ensure that the blocks are moved to the larger
> nodes.
>
> And once complete, bring back in the smaller nodes, but maybe only after
> you tweak the rack topology to match your disk layout more than network
> layout to compensate for the unbalanced nodes.
>
>
> Just my 2 cents
>
>
> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Thanks. We have a 1-1 configuration of drives and folder in all the
>> datanodes.
>>
>> -Tapas
>>
>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> On both types of nodes, what is your dfs.data.dir set to? Does it specify
>> multiple folders on the same set's of drives or is it 1-1 between folder
>> and drive?  If it's set to multiple folders on the same drives, it
>> is probably multiplying the amount of "available capacity" incorrectly in
>> that it assumes a 1-1 relationship between folder and total capacity of the
>> drive.
>>
>>
>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
It shouldn't cause further problems since most of your small nodes are
already their capacity.  You could set or increase the dfs reserved
property on your smaller nodes to force the flow of blocks onto the larger
nodes.
On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:

> Hi,
>
> Thanks for the idea, I will give this a try and report back.
>
> My worry is if we decommission a small node (one at a time), will it move
> the data to larger nodes or choke another smaller nodes ? In principle it
> should distribute the blocks, the point is it is not distributing the way
> we expect it to, so do you think this may cause further problems ?
>
> ---------
>
> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>
> Then I think the only way around this would be to decommission  1 at a
> time, the smaller nodes, and ensure that the blocks are moved to the larger
> nodes.
>
> And once complete, bring back in the smaller nodes, but maybe only after
> you tweak the rack topology to match your disk layout more than network
> layout to compensate for the unbalanced nodes.
>
>
> Just my 2 cents
>
>
> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Thanks. We have a 1-1 configuration of drives and folder in all the
>> datanodes.
>>
>> -Tapas
>>
>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> On both types of nodes, what is your dfs.data.dir set to? Does it specify
>> multiple folders on the same set's of drives or is it 1-1 between folder
>> and drive?  If it's set to multiple folders on the same drives, it
>> is probably multiplying the amount of "available capacity" incorrectly in
>> that it assumes a 1-1 relationship between folder and total capacity of the
>> drive.
>>
>>
>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
I think that it makes help,but start from 1 node.watch where data have moved

On Mon, Mar 25, 2013 at 12:44 AM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hi,
>
> Thanks for the idea, I will give this a try and report back.
>
> My worry is if we decommission a small node (one at a time), will it move
> the data to larger nodes or choke another smaller nodes ? In principle it
> should distribute the blocks, the point is it is not distributing the way
> we expect it to, so do you think this may cause further problems ?
>
> ---------
>
> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>
> Then I think the only way around this would be to decommission  1 at a
> time, the smaller nodes, and ensure that the blocks are moved to the larger
> nodes.
>
> And once complete, bring back in the smaller nodes, but maybe only after
> you tweak the rack topology to match your disk layout more than network
> layout to compensate for the unbalanced nodes.
>
>
> Just my 2 cents
>
>
> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Thanks. We have a 1-1 configuration of drives and folder in all the
>> datanodes.
>>
>> -Tapas
>>
>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> On both types of nodes, what is your dfs.data.dir set to? Does it specify
>> multiple folders on the same set's of drives or is it 1-1 between folder
>> and drive?  If it's set to multiple folders on the same drives, it
>> is probably multiplying the amount of "available capacity" incorrectly in
>> that it assumes a 1-1 relationship between folder and total capacity of the
>> drive.
>>
>>
>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
I think that it makes help,but start from 1 node.watch where data have moved

On Mon, Mar 25, 2013 at 12:44 AM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hi,
>
> Thanks for the idea, I will give this a try and report back.
>
> My worry is if we decommission a small node (one at a time), will it move
> the data to larger nodes or choke another smaller nodes ? In principle it
> should distribute the blocks, the point is it is not distributing the way
> we expect it to, so do you think this may cause further problems ?
>
> ---------
>
> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>
> Then I think the only way around this would be to decommission  1 at a
> time, the smaller nodes, and ensure that the blocks are moved to the larger
> nodes.
>
> And once complete, bring back in the smaller nodes, but maybe only after
> you tweak the rack topology to match your disk layout more than network
> layout to compensate for the unbalanced nodes.
>
>
> Just my 2 cents
>
>
> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Thanks. We have a 1-1 configuration of drives and folder in all the
>> datanodes.
>>
>> -Tapas
>>
>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> On both types of nodes, what is your dfs.data.dir set to? Does it specify
>> multiple folders on the same set's of drives or is it 1-1 between folder
>> and drive?  If it's set to multiple folders on the same drives, it
>> is probably multiplying the amount of "available capacity" incorrectly in
>> that it assumes a 1-1 relationship between folder and total capacity of the
>> drive.
>>
>>
>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
It shouldn't cause further problems since most of your small nodes are
already their capacity.  You could set or increase the dfs reserved
property on your smaller nodes to force the flow of blocks onto the larger
nodes.
On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:

> Hi,
>
> Thanks for the idea, I will give this a try and report back.
>
> My worry is if we decommission a small node (one at a time), will it move
> the data to larger nodes or choke another smaller nodes ? In principle it
> should distribute the blocks, the point is it is not distributing the way
> we expect it to, so do you think this may cause further problems ?
>
> ---------
>
> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>
> Then I think the only way around this would be to decommission  1 at a
> time, the smaller nodes, and ensure that the blocks are moved to the larger
> nodes.
>
> And once complete, bring back in the smaller nodes, but maybe only after
> you tweak the rack topology to match your disk layout more than network
> layout to compensate for the unbalanced nodes.
>
>
> Just my 2 cents
>
>
> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Thanks. We have a 1-1 configuration of drives and folder in all the
>> datanodes.
>>
>> -Tapas
>>
>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> On both types of nodes, what is your dfs.data.dir set to? Does it specify
>> multiple folders on the same set's of drives or is it 1-1 between folder
>> and drive?  If it's set to multiple folders on the same drives, it
>> is probably multiplying the amount of "available capacity" incorrectly in
>> that it assumes a 1-1 relationship between folder and total capacity of the
>> drive.
>>
>>
>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
I think that it makes help,but start from 1 node.watch where data have moved

On Mon, Mar 25, 2013 at 12:44 AM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hi,
>
> Thanks for the idea, I will give this a try and report back.
>
> My worry is if we decommission a small node (one at a time), will it move
> the data to larger nodes or choke another smaller nodes ? In principle it
> should distribute the blocks, the point is it is not distributing the way
> we expect it to, so do you think this may cause further problems ?
>
> ---------
>
> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>
> Then I think the only way around this would be to decommission  1 at a
> time, the smaller nodes, and ensure that the blocks are moved to the larger
> nodes.
>
> And once complete, bring back in the smaller nodes, but maybe only after
> you tweak the rack topology to match your disk layout more than network
> layout to compensate for the unbalanced nodes.
>
>
> Just my 2 cents
>
>
> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Thanks. We have a 1-1 configuration of drives and folder in all the
>> datanodes.
>>
>> -Tapas
>>
>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> On both types of nodes, what is your dfs.data.dir set to? Does it specify
>> multiple folders on the same set's of drives or is it 1-1 between folder
>> and drive?  If it's set to multiple folders on the same drives, it
>> is probably multiplying the amount of "available capacity" incorrectly in
>> that it assumes a 1-1 relationship between folder and total capacity of the
>> drive.
>>
>>
>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
It shouldn't cause further problems since most of your small nodes are
already their capacity.  You could set or increase the dfs reserved
property on your smaller nodes to force the flow of blocks onto the larger
nodes.
On Mar 24, 2013 4:45 PM, "Tapas Sarangi" <ta...@gmail.com> wrote:

> Hi,
>
> Thanks for the idea, I will give this a try and report back.
>
> My worry is if we decommission a small node (one at a time), will it move
> the data to larger nodes or choke another smaller nodes ? In principle it
> should distribute the blocks, the point is it is not distributing the way
> we expect it to, so do you think this may cause further problems ?
>
> ---------
>
> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>
> Then I think the only way around this would be to decommission  1 at a
> time, the smaller nodes, and ensure that the blocks are moved to the larger
> nodes.
>
> And once complete, bring back in the smaller nodes, but maybe only after
> you tweak the rack topology to match your disk layout more than network
> layout to compensate for the unbalanced nodes.
>
>
> Just my 2 cents
>
>
> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Thanks. We have a 1-1 configuration of drives and folder in all the
>> datanodes.
>>
>> -Tapas
>>
>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> On both types of nodes, what is your dfs.data.dir set to? Does it specify
>> multiple folders on the same set's of drives or is it 1-1 between folder
>> and drive?  If it's set to multiple folders on the same drives, it
>> is probably multiplying the amount of "available capacity" incorrectly in
>> that it assumes a 1-1 relationship between folder and total capacity of the
>> drive.
>>
>>
>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
I think that it makes help,but start from 1 node.watch where data have moved

On Mon, Mar 25, 2013 at 12:44 AM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hi,
>
> Thanks for the idea, I will give this a try and report back.
>
> My worry is if we decommission a small node (one at a time), will it move
> the data to larger nodes or choke another smaller nodes ? In principle it
> should distribute the blocks, the point is it is not distributing the way
> we expect it to, so do you think this may cause further problems ?
>
> ---------
>
> On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:
>
> Then I think the only way around this would be to decommission  1 at a
> time, the smaller nodes, and ensure that the blocks are moved to the larger
> nodes.
>
> And once complete, bring back in the smaller nodes, but maybe only after
> you tweak the rack topology to match your disk layout more than network
> layout to compensate for the unbalanced nodes.
>
>
> Just my 2 cents
>
>
> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Thanks. We have a 1-1 configuration of drives and folder in all the
>> datanodes.
>>
>> -Tapas
>>
>> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>>
>> On both types of nodes, what is your dfs.data.dir set to? Does it specify
>> multiple folders on the same set's of drives or is it 1-1 between folder
>> and drive?  If it's set to multiple folders on the same drives, it
>> is probably multiplying the amount of "available capacity" incorrectly in
>> that it assumes a 1-1 relationship between folder and total capacity of the
>> drive.
>>
>>
>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>>
>>> Yes, thanks for pointing, but I already know that it is completing the
>>> balancing when exiting otherwise it shouldn't exit.
>>> Your answer doesn't solve the problem I mentioned earlier in my message.
>>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>>> from the cluster even though "df" shows the cluster has about 500 TB of
>>> free space.
>>>
>>> -------
>>>
>>>
>>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> balaji@balajin.net> wrote:
>>>
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>>
>>> So the value is bytes per second. If it is running and exiting,it means
>>> it has completed the balancing.
>>>
>>>
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Yes, we are running balancer, though a balancer process runs for almost
>>>> a day or more before exiting and starting over.
>>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>>> is in Bits then we have a problem.
>>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>>
>>>> -----
>>>>
>>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>>> lists@balajin.net> wrote:
>>>>
>>>> Are you running balancer? If balancer is running and if it is slow, try
>>>> increasing the balancer bandwidth
>>>>
>>>>
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>>
>>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>>> of all live nodes.
>>>>>
>>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>>> nodes have a lot of unused space.
>>>>>
>>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>>> where it is not able to write any further even though the total space
>>>>> available in the cluster is about 500 TB. We believe this has something to
>>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>>> going wrong here...
>>>>>
>>>>> Thanks
>>>>> ------
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Balancer know about topology,but when calculate balancing it operates
>>>>> only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>>> string 509.
>>>>>
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>>
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>>
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>>> capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>>> .Balancer think that all good if  avgutil
>>>>> +10>node_utilizazation>=avgutil-10.
>>>>>
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>>
>>>>> Balancer cant help you.
>>>>>
>>>>> Show me
>>>>> http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
>>>>> you can.
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and
>>>>>> 72Tb you will be able to have only 12Tb replication data.
>>>>>>
>>>>>>
>>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>>
>>>>>>
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>>> be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>>
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>>
>>>>>> It helps with balancing,because dublicated  block must be another
>>>>>> rack.
>>>>>>
>>>>>>
>>>>>> The same question I asked earlier in this message, does multiple
>>>>>> racks with default threshold for the balancer minimizes the difference
>>>>>> between racks ?
>>>>>>
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>>> choise.
>>>>>>
>>>>>>
>>>>>> It wasn't my decision, and I probably can't change it now. I am new
>>>>>> to this cluster and trying to understand few issues. I will explore other
>>>>>> options as you mentioned.
>>>>>>
>>>>>> --
>>>>>> http://balajin.net/blog
>>>>>> http://flic.kr/balajijegan
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>>
>>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

Thanks for the idea, I will give this a try and report back. 

My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ?

---------

On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:

> Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  
> And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  
> 
> Just my 2 cents
> 
> 
> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.
> 
> -Tapas
> 
> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
> 
>> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
>> 
>> 
>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>> 
>> -------
>>  
>> 
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>> 
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>> 
>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>> 
>>> 
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>> 
>>> -----
>>> 
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>> 
>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>> 
>>>> 
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>> 
>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>> 
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>> 
>>>> Thanks
>>>> ------
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>> 
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>> 
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>> 
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>> 
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>> 
>>>>> Balancer cant help you.
>>>>> 
>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>> 
>>>>>  
>>>>> 
>>>>> 
>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>> 
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>> 
>>>>>> 
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>> 
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>> 
>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>> 
>>>>> 
>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>> 
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>> 
>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>> 
>>>>> -- 
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>> 
>> 
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

Thanks for the idea, I will give this a try and report back. 

My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ?

---------

On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:

> Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  
> And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  
> 
> Just my 2 cents
> 
> 
> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.
> 
> -Tapas
> 
> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
> 
>> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
>> 
>> 
>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>> 
>> -------
>>  
>> 
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>> 
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>> 
>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>> 
>>> 
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>> 
>>> -----
>>> 
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>> 
>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>> 
>>>> 
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>> 
>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>> 
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>> 
>>>> Thanks
>>>> ------
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>> 
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>> 
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>> 
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>> 
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>> 
>>>>> Balancer cant help you.
>>>>> 
>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>> 
>>>>>  
>>>>> 
>>>>> 
>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>> 
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>> 
>>>>>> 
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>> 
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>> 
>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>> 
>>>>> 
>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>> 
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>> 
>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>> 
>>>>> -- 
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>> 
>> 
> 
> 


Re:Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by see1230 <se...@163.com>.
if  the balancer is not  running ,or with a low bandwith and slow reaction, i think  there may have a signatual unsymmetric between datanodes .






At 2013-03-25 04:37:05,"Jamal B" <jm...@gmail.com> wrote:

Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  


Just my 2 cents



On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:

Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.


-Tapas


On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:


On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.



On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:

Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 


-------
 


On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:


 -setBalancerBandwidth <bandwidth in bytes per second>

So the value is bytes per second. If it is running and exiting,it means it has completed the balancing.




On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:

Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
What's the unit for "dfs.balance.bandwidthPerSec" ?


-----


On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:


Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth




On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:

Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.


All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 


Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...


Thanks
------













Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
You can see how it work in Balancer.java in  BalancerDatanode about string 509.

I was wrong about 350Tb,35Tb it calculates in such way :

For example:
cluster_capacity=3.5Pb
cluster_dfsused=2Pb

avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.

Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.

Balancer cant help you.

Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.

 





In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.



Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.



Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
For example:

rack1: 1 node with 72Tb
rack2: 6 nodes with 12Tb
rack3: 3 nodes with 24Tb

It helps with balancing,because dublicated  block must be another rack.




The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?


Why did you select hdfs?May be lustre,cephfs and other is better choise. 



It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.

--
http://balajin.net/blog
http://flic.kr/balajijegan





--
http://balajin.net/blog
http://flic.kr/balajijegan








Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

Thanks for the idea, I will give this a try and report back. 

My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ?

---------

On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:

> Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  
> And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  
> 
> Just my 2 cents
> 
> 
> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.
> 
> -Tapas
> 
> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
> 
>> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
>> 
>> 
>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>> 
>> -------
>>  
>> 
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>> 
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>> 
>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>> 
>>> 
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>> 
>>> -----
>>> 
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>> 
>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>> 
>>>> 
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>> 
>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>> 
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>> 
>>>> Thanks
>>>> ------
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>> 
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>> 
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>> 
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>> 
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>> 
>>>>> Balancer cant help you.
>>>>> 
>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>> 
>>>>>  
>>>>> 
>>>>> 
>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>> 
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>> 
>>>>>> 
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>> 
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>> 
>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>> 
>>>>> 
>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>> 
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>> 
>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>> 
>>>>> -- 
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>> 
>> 
> 
> 


Re:Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by see1230 <se...@163.com>.
if  the balancer is not  running ,or with a low bandwith and slow reaction, i think  there may have a signatual unsymmetric between datanodes .






At 2013-03-25 04:37:05,"Jamal B" <jm...@gmail.com> wrote:

Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  


Just my 2 cents



On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:

Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.


-Tapas


On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:


On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.



On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:

Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 


-------
 


On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:


 -setBalancerBandwidth <bandwidth in bytes per second>

So the value is bytes per second. If it is running and exiting,it means it has completed the balancing.




On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:

Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
What's the unit for "dfs.balance.bandwidthPerSec" ?


-----


On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:


Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth




On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:

Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.


All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 


Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...


Thanks
------













Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
You can see how it work in Balancer.java in  BalancerDatanode about string 509.

I was wrong about 350Tb,35Tb it calculates in such way :

For example:
cluster_capacity=3.5Pb
cluster_dfsused=2Pb

avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.

Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.

Balancer cant help you.

Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.

 





In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.



Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.



Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
For example:

rack1: 1 node with 72Tb
rack2: 6 nodes with 12Tb
rack3: 3 nodes with 24Tb

It helps with balancing,because dublicated  block must be another rack.




The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?


Why did you select hdfs?May be lustre,cephfs and other is better choise. 



It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.

--
http://balajin.net/blog
http://flic.kr/balajijegan





--
http://balajin.net/blog
http://flic.kr/balajijegan








Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Hi,

Thanks for the idea, I will give this a try and report back. 

My worry is if we decommission a small node (one at a time), will it move the data to larger nodes or choke another smaller nodes ? In principle it should distribute the blocks, the point is it is not distributing the way we expect it to, so do you think this may cause further problems ?

---------

On Mar 24, 2013, at 3:37 PM, Jamal B <jm...@gmail.com> wrote:

> Then I think the only way around this would be to decommission  1 at a time, the smaller nodes, and ensure that the blocks are moved to the larger nodes.  
> And once complete, bring back in the smaller nodes, but maybe only after you tweak the rack topology to match your disk layout more than network layout to compensate for the unbalanced nodes.  
> 
> Just my 2 cents
> 
> 
> On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.
> 
> -Tapas
> 
> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
> 
>> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
>> 
>> 
>> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
>> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
>> 
>> -------
>>  
>> 
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
>> 
>>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>> 
>>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>>> 
>>> 
>>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>> 
>>> -----
>>> 
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>>> 
>>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>>> 
>>>> 
>>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>>> 
>>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>>> 
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>>> 
>>>> Thanks
>>>> ------
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>>> 
>>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>> 
>>>>> For example:
>>>>> cluster_capacity=3.5Pb
>>>>> cluster_dfsused=2Pb
>>>>> 
>>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>>> 
>>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>>> 
>>>>> Balancer cant help you.
>>>>> 
>>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>>> 
>>>>>  
>>>>> 
>>>>> 
>>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>>> 
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>>> 
>>>>>> 
>>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>>> For example:
>>>>>> 
>>>>>> rack1: 1 node with 72Tb
>>>>>> rack2: 6 nodes with 12Tb
>>>>>> rack3: 3 nodes with 24Tb
>>>>>> 
>>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>> 
>>>>> 
>>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>>> 
>>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>>> 
>>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>>> 
>>>>> -- 
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>> 
>> 
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
Then I think the only way around this would be to decommission  1 at a
time, the smaller nodes, and ensure that the blocks are moved to the larger
nodes.  And once complete, bring back in the smaller nodes, but maybe only
after you tweak the rack topology to match your disk layout more than
network layout to compensate for the unbalanced nodes.

Just my 2 cents


On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Thanks. We have a 1-1 configuration of drives and folder in all the
> datanodes.
>
> -Tapas
>
> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>
> On both types of nodes, what is your dfs.data.dir set to? Does it specify
> multiple folders on the same set's of drives or is it 1-1 between folder
> and drive?  If it's set to multiple folders on the same drives, it
> is probably multiplying the amount of "available capacity" incorrectly in
> that it assumes a 1-1 relationship between folder and total capacity of the
> drive.
>
>
> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Yes, thanks for pointing, but I already know that it is completing the
>> balancing when exiting otherwise it shouldn't exit.
>> Your answer doesn't solve the problem I mentioned earlier in my message.
>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>> from the cluster even though "df" shows the cluster has about 500 TB of
>> free space.
>>
>> -------
>>
>>
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> balaji@balajin.net> wrote:
>>
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>
>> So the value is bytes per second. If it is running and exiting,it means
>> it has completed the balancing.
>>
>>
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Yes, we are running balancer, though a balancer process runs for almost
>>> a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>> is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>
>>> -----
>>>
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> lists@balajin.net> wrote:
>>>
>>> Are you running balancer? If balancer is running and if it is slow, try
>>> increasing the balancer bandwidth
>>>
>>>
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>> of all live nodes.
>>>>
>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>> nodes have a lot of unused space.
>>>>
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>> where it is not able to write any further even though the total space
>>>> available in the cluster is about 500 TB. We believe this has something to
>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>> going wrong here...
>>>>
>>>> Thanks
>>>> ------
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Balancer know about topology,but when calculate balancing it operates
>>>> only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>> string 509.
>>>>
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>>
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>> capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>> .Balancer think that all good if  avgutil
>>>> +10>node_utilizazation>=avgutil-10.
>>>>
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>
>>>> Balancer cant help you.
>>>>
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>>> you will be able to have only 12Tb replication data.
>>>>>
>>>>>
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>
>>>>>
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>> be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>>
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>>
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>
>>>>>
>>>>> The same question I asked earlier in this message, does multiple racks
>>>>> with default threshold for the balancer minimizes the difference between
>>>>> racks ?
>>>>>
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>> choise.
>>>>>
>>>>>
>>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>>> this cluster and trying to understand few issues. I will explore other
>>>>> options as you mentioned.
>>>>>
>>>>> --
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>>>
>>>>
>>>
>>
>>
>> --
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
Then I think the only way around this would be to decommission  1 at a
time, the smaller nodes, and ensure that the blocks are moved to the larger
nodes.  And once complete, bring back in the smaller nodes, but maybe only
after you tweak the rack topology to match your disk layout more than
network layout to compensate for the unbalanced nodes.

Just my 2 cents


On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Thanks. We have a 1-1 configuration of drives and folder in all the
> datanodes.
>
> -Tapas
>
> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>
> On both types of nodes, what is your dfs.data.dir set to? Does it specify
> multiple folders on the same set's of drives or is it 1-1 between folder
> and drive?  If it's set to multiple folders on the same drives, it
> is probably multiplying the amount of "available capacity" incorrectly in
> that it assumes a 1-1 relationship between folder and total capacity of the
> drive.
>
>
> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Yes, thanks for pointing, but I already know that it is completing the
>> balancing when exiting otherwise it shouldn't exit.
>> Your answer doesn't solve the problem I mentioned earlier in my message.
>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>> from the cluster even though "df" shows the cluster has about 500 TB of
>> free space.
>>
>> -------
>>
>>
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> balaji@balajin.net> wrote:
>>
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>
>> So the value is bytes per second. If it is running and exiting,it means
>> it has completed the balancing.
>>
>>
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Yes, we are running balancer, though a balancer process runs for almost
>>> a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>> is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>
>>> -----
>>>
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> lists@balajin.net> wrote:
>>>
>>> Are you running balancer? If balancer is running and if it is slow, try
>>> increasing the balancer bandwidth
>>>
>>>
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>> of all live nodes.
>>>>
>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>> nodes have a lot of unused space.
>>>>
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>> where it is not able to write any further even though the total space
>>>> available in the cluster is about 500 TB. We believe this has something to
>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>> going wrong here...
>>>>
>>>> Thanks
>>>> ------
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Balancer know about topology,but when calculate balancing it operates
>>>> only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>> string 509.
>>>>
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>>
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>> capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>> .Balancer think that all good if  avgutil
>>>> +10>node_utilizazation>=avgutil-10.
>>>>
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>
>>>> Balancer cant help you.
>>>>
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>>> you will be able to have only 12Tb replication data.
>>>>>
>>>>>
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>
>>>>>
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>> be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>>
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>>
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>
>>>>>
>>>>> The same question I asked earlier in this message, does multiple racks
>>>>> with default threshold for the balancer minimizes the difference between
>>>>> racks ?
>>>>>
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>> choise.
>>>>>
>>>>>
>>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>>> this cluster and trying to understand few issues. I will explore other
>>>>> options as you mentioned.
>>>>>
>>>>> --
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>>>
>>>>
>>>
>>
>>
>> --
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
Then I think the only way around this would be to decommission  1 at a
time, the smaller nodes, and ensure that the blocks are moved to the larger
nodes.  And once complete, bring back in the smaller nodes, but maybe only
after you tweak the rack topology to match your disk layout more than
network layout to compensate for the unbalanced nodes.

Just my 2 cents


On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Thanks. We have a 1-1 configuration of drives and folder in all the
> datanodes.
>
> -Tapas
>
> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>
> On both types of nodes, what is your dfs.data.dir set to? Does it specify
> multiple folders on the same set's of drives or is it 1-1 between folder
> and drive?  If it's set to multiple folders on the same drives, it
> is probably multiplying the amount of "available capacity" incorrectly in
> that it assumes a 1-1 relationship between folder and total capacity of the
> drive.
>
>
> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Yes, thanks for pointing, but I already know that it is completing the
>> balancing when exiting otherwise it shouldn't exit.
>> Your answer doesn't solve the problem I mentioned earlier in my message.
>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>> from the cluster even though "df" shows the cluster has about 500 TB of
>> free space.
>>
>> -------
>>
>>
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> balaji@balajin.net> wrote:
>>
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>
>> So the value is bytes per second. If it is running and exiting,it means
>> it has completed the balancing.
>>
>>
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Yes, we are running balancer, though a balancer process runs for almost
>>> a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>> is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>
>>> -----
>>>
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> lists@balajin.net> wrote:
>>>
>>> Are you running balancer? If balancer is running and if it is slow, try
>>> increasing the balancer bandwidth
>>>
>>>
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>> of all live nodes.
>>>>
>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>> nodes have a lot of unused space.
>>>>
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>> where it is not able to write any further even though the total space
>>>> available in the cluster is about 500 TB. We believe this has something to
>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>> going wrong here...
>>>>
>>>> Thanks
>>>> ------
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Balancer know about topology,but when calculate balancing it operates
>>>> only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>> string 509.
>>>>
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>>
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>> capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>> .Balancer think that all good if  avgutil
>>>> +10>node_utilizazation>=avgutil-10.
>>>>
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>
>>>> Balancer cant help you.
>>>>
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>>> you will be able to have only 12Tb replication data.
>>>>>
>>>>>
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>
>>>>>
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>> be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>>
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>>
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>
>>>>>
>>>>> The same question I asked earlier in this message, does multiple racks
>>>>> with default threshold for the balancer minimizes the difference between
>>>>> racks ?
>>>>>
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>> choise.
>>>>>
>>>>>
>>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>>> this cluster and trying to understand few issues. I will explore other
>>>>> options as you mentioned.
>>>>>
>>>>> --
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>>>
>>>>
>>>
>>
>>
>> --
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
Then I think the only way around this would be to decommission  1 at a
time, the smaller nodes, and ensure that the blocks are moved to the larger
nodes.  And once complete, bring back in the smaller nodes, but maybe only
after you tweak the rack topology to match your disk layout more than
network layout to compensate for the unbalanced nodes.

Just my 2 cents


On Sun, Mar 24, 2013 at 4:31 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Thanks. We have a 1-1 configuration of drives and folder in all the
> datanodes.
>
> -Tapas
>
> On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:
>
> On both types of nodes, what is your dfs.data.dir set to? Does it specify
> multiple folders on the same set's of drives or is it 1-1 between folder
> and drive?  If it's set to multiple folders on the same drives, it
> is probably multiplying the amount of "available capacity" incorrectly in
> that it assumes a 1-1 relationship between folder and total capacity of the
> drive.
>
>
> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:
>
>> Yes, thanks for pointing, but I already know that it is completing the
>> balancing when exiting otherwise it shouldn't exit.
>> Your answer doesn't solve the problem I mentioned earlier in my message.
>> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
>> from the cluster even though "df" shows the cluster has about 500 TB of
>> free space.
>>
>> -------
>>
>>
>> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> balaji@balajin.net> wrote:
>>
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>>
>> So the value is bytes per second. If it is running and exiting,it means
>> it has completed the balancing.
>>
>>
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Yes, we are running balancer, though a balancer process runs for almost
>>> a day or more before exiting and starting over.
>>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>>> is in Bits then we have a problem.
>>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>>
>>> -----
>>>
>>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>>> lists@balajin.net> wrote:
>>>
>>> Are you running balancer? If balancer is running and if it is slow, try
>>> increasing the balancer bandwidth
>>>
>>>
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>>
>>>> Thanks for the follow up. I don't know whether attachment will pass
>>>> through this mailing list, but I am attaching a pdf that contains the usage
>>>> of all live nodes.
>>>>
>>>> All nodes starting with letter "g" are the ones with smaller storage
>>>> space where as nodes starting with letter "s" have larger storage space. As
>>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>>> nodes have a lot of unused space.
>>>>
>>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>>> where it is not able to write any further even though the total space
>>>> available in the cluster is about 500 TB. We believe this has something to
>>>> do with the way it is balancing the nodes, but don't understand the problem
>>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>>> going wrong here...
>>>>
>>>> Thanks
>>>> ------
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Balancer know about topology,but when calculate balancing it operates
>>>> only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>>> string 509.
>>>>
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>>
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>>
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster
>>>> capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>>> .Balancer think that all good if  avgutil
>>>> +10>node_utilizazation>=avgutil-10.
>>>>
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>>
>>>> Balancer cant help you.
>>>>
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>>> you will be able to have only 12Tb replication data.
>>>>>
>>>>>
>>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>>
>>>>>
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must
>>>>> be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>>
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>>
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>>
>>>>>
>>>>> The same question I asked earlier in this message, does multiple racks
>>>>> with default threshold for the balancer minimizes the difference between
>>>>> racks ?
>>>>>
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>>> choise.
>>>>>
>>>>>
>>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>>> this cluster and trying to understand few issues. I will explore other
>>>>> options as you mentioned.
>>>>>
>>>>> --
>>>>> http://balajin.net/blog
>>>>> http://flic.kr/balajijegan
>>>>>
>>>>
>>>
>>
>>
>> --
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
>>
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.

-Tapas

On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:

> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
> 
> 
> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
> 
> -------
>  
> 
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
> 
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>> 
>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>> 
>> 
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>> 
>> -----
>> 
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>> 
>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>> 
>>> 
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>> 
>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>> 
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>> 
>>> Thanks
>>> ------
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>> 
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>> 
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>> 
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>> 
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>> 
>>>> Balancer cant help you.
>>>> 
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>> 
>>>>  
>>>> 
>>>> 
>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>> 
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>> 
>>>>> 
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>> 
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>> 
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>> 
>>>> 
>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>> 
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>> 
>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>> 
>>>> -- 
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>> 
>> 
>> 
>> 
>> -- 
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.

-Tapas

On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:

> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
> 
> 
> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
> 
> -------
>  
> 
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
> 
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>> 
>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>> 
>> 
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>> 
>> -----
>> 
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>> 
>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>> 
>>> 
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>> 
>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>> 
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>> 
>>> Thanks
>>> ------
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>> 
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>> 
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>> 
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>> 
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>> 
>>>> Balancer cant help you.
>>>> 
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>> 
>>>>  
>>>> 
>>>> 
>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>> 
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>> 
>>>>> 
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>> 
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>> 
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>> 
>>>> 
>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>> 
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>> 
>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>> 
>>>> -- 
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>> 
>> 
>> 
>> 
>> -- 
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.

-Tapas

On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:

> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
> 
> 
> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
> 
> -------
>  
> 
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
> 
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>> 
>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>> 
>> 
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>> 
>> -----
>> 
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>> 
>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>> 
>>> 
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>> 
>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>> 
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>> 
>>> Thanks
>>> ------
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>> 
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>> 
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>> 
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>> 
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>> 
>>>> Balancer cant help you.
>>>> 
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>> 
>>>>  
>>>> 
>>>> 
>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>> 
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>> 
>>>>> 
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>> 
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>> 
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>> 
>>>> 
>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>> 
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>> 
>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>> 
>>>> -- 
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>> 
>> 
>> 
>> 
>> -- 
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks. We have a 1-1 configuration of drives and folder in all the datanodes.

-Tapas

On Mar 24, 2013, at 3:29 PM, Jamal B <jm...@gmail.com> wrote:

> On both types of nodes, what is your dfs.data.dir set to? Does it specify multiple folders on the same set's of drives or is it 1-1 between folder and drive?  If it's set to multiple folders on the same drives, it is probably multiplying the amount of "available capacity" incorrectly in that it assumes a 1-1 relationship between folder and total capacity of the drive.
> 
> 
> On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com> wrote:
> Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
> Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 
> 
> -------
>  
> 
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:
> 
>>  -setBalancerBandwidth <bandwidth in bytes per second>
>> 
>> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
>> 
>> 
>> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>> 
>> -----
>> 
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
>> 
>>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>>> 
>>> 
>>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>>> 
>>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>>> 
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>>> 
>>> Thanks
>>> ------
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> 
>>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>>> 
>>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>> 
>>>> For example:
>>>> cluster_capacity=3.5Pb
>>>> cluster_dfsused=2Pb
>>>> 
>>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>>> 
>>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>>> 
>>>> Balancer cant help you.
>>>> 
>>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>>> 
>>>>  
>>>> 
>>>> 
>>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>>> 
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>>> 
>>>>> 
>>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>>> For example:
>>>>> 
>>>>> rack1: 1 node with 72Tb
>>>>> rack2: 6 nodes with 12Tb
>>>>> rack3: 3 nodes with 24Tb
>>>>> 
>>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>> 
>>>> 
>>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>>> 
>>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>>> 
>>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>>> 
>>>> -- 
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>> 
>> 
>> 
>> 
>> -- 
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
On both types of nodes, what is your dfs.data.dir set to? Does it specify
multiple folders on the same set's of drives or is it 1-1 between folder
and drive?  If it's set to multiple folders on the same drives, it
is probably multiplying the amount of "available capacity" incorrectly in
that it assumes a 1-1 relationship between folder and total capacity of the
drive.


On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Yes, thanks for pointing, but I already know that it is completing the
> balancing when exiting otherwise it shouldn't exit.
> Your answer doesn't solve the problem I mentioned earlier in my message.
> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
> from the cluster even though "df" shows the cluster has about 500 TB of
> free space.
>
> -------
>
>
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
> balaji@balajin.net> wrote:
>
>  -setBalancerBandwidth <bandwidth in bytes per second>
>
> So the value is bytes per second. If it is running and exiting,it means it
> has completed the balancing.
>
>
> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>
>> Yes, we are running balancer, though a balancer process runs for almost a
>> day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>> is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>
>> -----
>>
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> lists@balajin.net> wrote:
>>
>> Are you running balancer? If balancer is running and if it is slow, try
>> increasing the balancer bandwidth
>>
>>
>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Thanks for the follow up. I don't know whether attachment will pass
>>> through this mailing list, but I am attaching a pdf that contains the usage
>>> of all live nodes.
>>>
>>> All nodes starting with letter "g" are the ones with smaller storage
>>> space where as nodes starting with letter "s" have larger storage space. As
>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>> nodes have a lot of unused space.
>>>
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>> where it is not able to write any further even though the total space
>>> available in the cluster is about 500 TB. We believe this has something to
>>> do with the way it is balancing the nodes, but don't understand the problem
>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>> going wrong here...
>>>
>>> Thanks
>>> ------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Balancer know about topology,but when calculate balancing it operates
>>> only with nodes not with racks.
>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>> string 509.
>>>
>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>
>>> For example:
>>> cluster_capacity=3.5Pb
>>> cluster_dfsused=2Pb
>>>
>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>> .Balancer think that all good if  avgutil
>>> +10>node_utilizazation>=avgutil-10.
>>>
>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>
>>> Balancer cant help you.
>>>
>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>
>>>
>>>
>>>>
>>>>
>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>> you will be able to have only 12Tb replication data.
>>>>
>>>>
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>
>>>>
>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>>>> with identical capacity.Racks must be identical capacity.
>>>> For example:
>>>>
>>>> rack1: 1 node with 72Tb
>>>> rack2: 6 nodes with 12Tb
>>>> rack3: 3 nodes with 24Tb
>>>>
>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>
>>>>
>>>> The same question I asked earlier in this message, does multiple racks
>>>> with default threshold for the balancer minimizes the difference between
>>>> racks ?
>>>>
>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>> choise.
>>>>
>>>>
>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>> this cluster and trying to understand few issues. I will explore other
>>>> options as you mentioned.
>>>>
>>>> --
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>>>
>>>
>>
>
>
> --
> http://balajin.net/blog
> http://flic.kr/balajijegan
>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
On both types of nodes, what is your dfs.data.dir set to? Does it specify
multiple folders on the same set's of drives or is it 1-1 between folder
and drive?  If it's set to multiple folders on the same drives, it
is probably multiplying the amount of "available capacity" incorrectly in
that it assumes a 1-1 relationship between folder and total capacity of the
drive.


On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Yes, thanks for pointing, but I already know that it is completing the
> balancing when exiting otherwise it shouldn't exit.
> Your answer doesn't solve the problem I mentioned earlier in my message.
> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
> from the cluster even though "df" shows the cluster has about 500 TB of
> free space.
>
> -------
>
>
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
> balaji@balajin.net> wrote:
>
>  -setBalancerBandwidth <bandwidth in bytes per second>
>
> So the value is bytes per second. If it is running and exiting,it means it
> has completed the balancing.
>
>
> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>
>> Yes, we are running balancer, though a balancer process runs for almost a
>> day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>> is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>
>> -----
>>
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> lists@balajin.net> wrote:
>>
>> Are you running balancer? If balancer is running and if it is slow, try
>> increasing the balancer bandwidth
>>
>>
>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Thanks for the follow up. I don't know whether attachment will pass
>>> through this mailing list, but I am attaching a pdf that contains the usage
>>> of all live nodes.
>>>
>>> All nodes starting with letter "g" are the ones with smaller storage
>>> space where as nodes starting with letter "s" have larger storage space. As
>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>> nodes have a lot of unused space.
>>>
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>> where it is not able to write any further even though the total space
>>> available in the cluster is about 500 TB. We believe this has something to
>>> do with the way it is balancing the nodes, but don't understand the problem
>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>> going wrong here...
>>>
>>> Thanks
>>> ------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Balancer know about topology,but when calculate balancing it operates
>>> only with nodes not with racks.
>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>> string 509.
>>>
>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>
>>> For example:
>>> cluster_capacity=3.5Pb
>>> cluster_dfsused=2Pb
>>>
>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>> .Balancer think that all good if  avgutil
>>> +10>node_utilizazation>=avgutil-10.
>>>
>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>
>>> Balancer cant help you.
>>>
>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>
>>>
>>>
>>>>
>>>>
>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>> you will be able to have only 12Tb replication data.
>>>>
>>>>
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>
>>>>
>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>>>> with identical capacity.Racks must be identical capacity.
>>>> For example:
>>>>
>>>> rack1: 1 node with 72Tb
>>>> rack2: 6 nodes with 12Tb
>>>> rack3: 3 nodes with 24Tb
>>>>
>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>
>>>>
>>>> The same question I asked earlier in this message, does multiple racks
>>>> with default threshold for the balancer minimizes the difference between
>>>> racks ?
>>>>
>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>> choise.
>>>>
>>>>
>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>> this cluster and trying to understand few issues. I will explore other
>>>> options as you mentioned.
>>>>
>>>> --
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>>>
>>>
>>
>
>
> --
> http://balajin.net/blog
> http://flic.kr/balajijegan
>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
On both types of nodes, what is your dfs.data.dir set to? Does it specify
multiple folders on the same set's of drives or is it 1-1 between folder
and drive?  If it's set to multiple folders on the same drives, it
is probably multiplying the amount of "available capacity" incorrectly in
that it assumes a 1-1 relationship between folder and total capacity of the
drive.


On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Yes, thanks for pointing, but I already know that it is completing the
> balancing when exiting otherwise it shouldn't exit.
> Your answer doesn't solve the problem I mentioned earlier in my message.
> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
> from the cluster even though "df" shows the cluster has about 500 TB of
> free space.
>
> -------
>
>
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
> balaji@balajin.net> wrote:
>
>  -setBalancerBandwidth <bandwidth in bytes per second>
>
> So the value is bytes per second. If it is running and exiting,it means it
> has completed the balancing.
>
>
> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>
>> Yes, we are running balancer, though a balancer process runs for almost a
>> day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>> is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>
>> -----
>>
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> lists@balajin.net> wrote:
>>
>> Are you running balancer? If balancer is running and if it is slow, try
>> increasing the balancer bandwidth
>>
>>
>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Thanks for the follow up. I don't know whether attachment will pass
>>> through this mailing list, but I am attaching a pdf that contains the usage
>>> of all live nodes.
>>>
>>> All nodes starting with letter "g" are the ones with smaller storage
>>> space where as nodes starting with letter "s" have larger storage space. As
>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>> nodes have a lot of unused space.
>>>
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>> where it is not able to write any further even though the total space
>>> available in the cluster is about 500 TB. We believe this has something to
>>> do with the way it is balancing the nodes, but don't understand the problem
>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>> going wrong here...
>>>
>>> Thanks
>>> ------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Balancer know about topology,but when calculate balancing it operates
>>> only with nodes not with racks.
>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>> string 509.
>>>
>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>
>>> For example:
>>> cluster_capacity=3.5Pb
>>> cluster_dfsused=2Pb
>>>
>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>> .Balancer think that all good if  avgutil
>>> +10>node_utilizazation>=avgutil-10.
>>>
>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>
>>> Balancer cant help you.
>>>
>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>
>>>
>>>
>>>>
>>>>
>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>> you will be able to have only 12Tb replication data.
>>>>
>>>>
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>
>>>>
>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>>>> with identical capacity.Racks must be identical capacity.
>>>> For example:
>>>>
>>>> rack1: 1 node with 72Tb
>>>> rack2: 6 nodes with 12Tb
>>>> rack3: 3 nodes with 24Tb
>>>>
>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>
>>>>
>>>> The same question I asked earlier in this message, does multiple racks
>>>> with default threshold for the balancer minimizes the difference between
>>>> racks ?
>>>>
>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>> choise.
>>>>
>>>>
>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>> this cluster and trying to understand few issues. I will explore other
>>>> options as you mentioned.
>>>>
>>>> --
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>>>
>>>
>>
>
>
> --
> http://balajin.net/blog
> http://flic.kr/balajijegan
>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Jamal B <jm...@gmail.com>.
On both types of nodes, what is your dfs.data.dir set to? Does it specify
multiple folders on the same set's of drives or is it 1-1 between folder
and drive?  If it's set to multiple folders on the same drives, it
is probably multiplying the amount of "available capacity" incorrectly in
that it assumes a 1-1 relationship between folder and total capacity of the
drive.


On Sun, Mar 24, 2013 at 3:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Yes, thanks for pointing, but I already know that it is completing the
> balancing when exiting otherwise it shouldn't exit.
> Your answer doesn't solve the problem I mentioned earlier in my message.
> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
> from the cluster even though "df" shows the cluster has about 500 TB of
> free space.
>
> -------
>
>
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
> balaji@balajin.net> wrote:
>
>  -setBalancerBandwidth <bandwidth in bytes per second>
>
> So the value is bytes per second. If it is running and exiting,it means it
> has completed the balancing.
>
>
> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>
>> Yes, we are running balancer, though a balancer process runs for almost a
>> day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>> is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>
>> -----
>>
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> lists@balajin.net> wrote:
>>
>> Are you running balancer? If balancer is running and if it is slow, try
>> increasing the balancer bandwidth
>>
>>
>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Thanks for the follow up. I don't know whether attachment will pass
>>> through this mailing list, but I am attaching a pdf that contains the usage
>>> of all live nodes.
>>>
>>> All nodes starting with letter "g" are the ones with smaller storage
>>> space where as nodes starting with letter "s" have larger storage space. As
>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>> nodes have a lot of unused space.
>>>
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>> where it is not able to write any further even though the total space
>>> available in the cluster is about 500 TB. We believe this has something to
>>> do with the way it is balancing the nodes, but don't understand the problem
>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>> going wrong here...
>>>
>>> Thanks
>>> ------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Balancer know about topology,but when calculate balancing it operates
>>> only with nodes not with racks.
>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>> string 509.
>>>
>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>
>>> For example:
>>> cluster_capacity=3.5Pb
>>> cluster_dfsused=2Pb
>>>
>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>> .Balancer think that all good if  avgutil
>>> +10>node_utilizazation>=avgutil-10.
>>>
>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>
>>> Balancer cant help you.
>>>
>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>
>>>
>>>
>>>>
>>>>
>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>> you will be able to have only 12Tb replication data.
>>>>
>>>>
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>
>>>>
>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>>>> with identical capacity.Racks must be identical capacity.
>>>> For example:
>>>>
>>>> rack1: 1 node with 72Tb
>>>> rack2: 6 nodes with 12Tb
>>>> rack3: 3 nodes with 24Tb
>>>>
>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>
>>>>
>>>> The same question I asked earlier in this message, does multiple racks
>>>> with default threshold for the balancer minimizes the difference between
>>>> racks ?
>>>>
>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>> choise.
>>>>
>>>>
>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>> this cluster and trying to understand few issues. I will explore other
>>>> options as you mentioned.
>>>>
>>>> --
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>>>
>>>
>>
>
>
> --
> http://balajin.net/blog
> http://flic.kr/balajijegan
>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
you said that threshold=10.Run mannualy command : hadoop balancer threshold
9.5 ,then 9 and so with 0.5 step.

On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Yes, thanks for pointing, but I already know that it is completing the
> balancing when exiting otherwise it shouldn't exit.
> Your answer doesn't solve the problem I mentioned earlier in my message.
> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
> from the cluster even though "df" shows the cluster has about 500 TB of
> free space.
>
> -------
>
>
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
> balaji@balajin.net> wrote:
>
>  -setBalancerBandwidth <bandwidth in bytes per second>
>
> So the value is bytes per second. If it is running and exiting,it means it
> has completed the balancing.
>
>
> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>
>> Yes, we are running balancer, though a balancer process runs for almost a
>> day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>> is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>
>> -----
>>
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> lists@balajin.net> wrote:
>>
>> Are you running balancer? If balancer is running and if it is slow, try
>> increasing the balancer bandwidth
>>
>>
>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Thanks for the follow up. I don't know whether attachment will pass
>>> through this mailing list, but I am attaching a pdf that contains the usage
>>> of all live nodes.
>>>
>>> All nodes starting with letter "g" are the ones with smaller storage
>>> space where as nodes starting with letter "s" have larger storage space. As
>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>> nodes have a lot of unused space.
>>>
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>> where it is not able to write any further even though the total space
>>> available in the cluster is about 500 TB. We believe this has something to
>>> do with the way it is balancing the nodes, but don't understand the problem
>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>> going wrong here...
>>>
>>> Thanks
>>> ------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Balancer know about topology,but when calculate balancing it operates
>>> only with nodes not with racks.
>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>> string 509.
>>>
>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>
>>> For example:
>>> cluster_capacity=3.5Pb
>>> cluster_dfsused=2Pb
>>>
>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>> .Balancer think that all good if  avgutil
>>> +10>node_utilizazation>=avgutil-10.
>>>
>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>
>>> Balancer cant help you.
>>>
>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>
>>>
>>>
>>>>
>>>>
>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>> you will be able to have only 12Tb replication data.
>>>>
>>>>
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>
>>>>
>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>>>> with identical capacity.Racks must be identical capacity.
>>>> For example:
>>>>
>>>> rack1: 1 node with 72Tb
>>>> rack2: 6 nodes with 12Tb
>>>> rack3: 3 nodes with 24Tb
>>>>
>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>
>>>>
>>>> The same question I asked earlier in this message, does multiple racks
>>>> with default threshold for the balancer minimizes the difference between
>>>> racks ?
>>>>
>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>> choise.
>>>>
>>>>
>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>> this cluster and trying to understand few issues. I will explore other
>>>> options as you mentioned.
>>>>
>>>> --
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>>>
>>>
>>
>
>
> --
> http://balajin.net/blog
> http://flic.kr/balajijegan
>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
you said that threshold=10.Run mannualy command : hadoop balancer threshold
9.5 ,then 9 and so with 0.5 step.

On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Yes, thanks for pointing, but I already know that it is completing the
> balancing when exiting otherwise it shouldn't exit.
> Your answer doesn't solve the problem I mentioned earlier in my message.
> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
> from the cluster even though "df" shows the cluster has about 500 TB of
> free space.
>
> -------
>
>
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
> balaji@balajin.net> wrote:
>
>  -setBalancerBandwidth <bandwidth in bytes per second>
>
> So the value is bytes per second. If it is running and exiting,it means it
> has completed the balancing.
>
>
> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>
>> Yes, we are running balancer, though a balancer process runs for almost a
>> day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>> is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>
>> -----
>>
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> lists@balajin.net> wrote:
>>
>> Are you running balancer? If balancer is running and if it is slow, try
>> increasing the balancer bandwidth
>>
>>
>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Thanks for the follow up. I don't know whether attachment will pass
>>> through this mailing list, but I am attaching a pdf that contains the usage
>>> of all live nodes.
>>>
>>> All nodes starting with letter "g" are the ones with smaller storage
>>> space where as nodes starting with letter "s" have larger storage space. As
>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>> nodes have a lot of unused space.
>>>
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>> where it is not able to write any further even though the total space
>>> available in the cluster is about 500 TB. We believe this has something to
>>> do with the way it is balancing the nodes, but don't understand the problem
>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>> going wrong here...
>>>
>>> Thanks
>>> ------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Balancer know about topology,but when calculate balancing it operates
>>> only with nodes not with racks.
>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>> string 509.
>>>
>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>
>>> For example:
>>> cluster_capacity=3.5Pb
>>> cluster_dfsused=2Pb
>>>
>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>> .Balancer think that all good if  avgutil
>>> +10>node_utilizazation>=avgutil-10.
>>>
>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>
>>> Balancer cant help you.
>>>
>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>
>>>
>>>
>>>>
>>>>
>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>> you will be able to have only 12Tb replication data.
>>>>
>>>>
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>
>>>>
>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>>>> with identical capacity.Racks must be identical capacity.
>>>> For example:
>>>>
>>>> rack1: 1 node with 72Tb
>>>> rack2: 6 nodes with 12Tb
>>>> rack3: 3 nodes with 24Tb
>>>>
>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>
>>>>
>>>> The same question I asked earlier in this message, does multiple racks
>>>> with default threshold for the balancer minimizes the difference between
>>>> racks ?
>>>>
>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>> choise.
>>>>
>>>>
>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>> this cluster and trying to understand few issues. I will explore other
>>>> options as you mentioned.
>>>>
>>>> --
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>>>
>>>
>>
>
>
> --
> http://balajin.net/blog
> http://flic.kr/balajijegan
>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Alexey Babutin <zo...@gmail.com>.
you said that threshold=10.Run mannualy command : hadoop balancer threshold
9.5 ,then 9 and so with 0.5 step.

On Sun, Mar 24, 2013 at 11:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Yes, thanks for pointing, but I already know that it is completing the
> balancing when exiting otherwise it shouldn't exit.
> Your answer doesn't solve the problem I mentioned earlier in my message.
> 'hdfs' is stalling and hadoop is not writing unless space is cleared up
> from the cluster even though "df" shows the cluster has about 500 TB of
> free space.
>
> -------
>
>
> On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
> balaji@balajin.net> wrote:
>
>  -setBalancerBandwidth <bandwidth in bytes per second>
>
> So the value is bytes per second. If it is running and exiting,it means it
> has completed the balancing.
>
>
> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
>
>> Yes, we are running balancer, though a balancer process runs for almost a
>> day or more before exiting and starting over.
>> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
>> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
>> is in Bits then we have a problem.
>> What's the unit for "dfs.balance.bandwidthPerSec" ?
>>
>> -----
>>
>> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
>> lists@balajin.net> wrote:
>>
>> Are you running balancer? If balancer is running and if it is slow, try
>> increasing the balancer bandwidth
>>
>>
>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>>
>>> Thanks for the follow up. I don't know whether attachment will pass
>>> through this mailing list, but I am attaching a pdf that contains the usage
>>> of all live nodes.
>>>
>>> All nodes starting with letter "g" are the ones with smaller storage
>>> space where as nodes starting with letter "s" have larger storage space. As
>>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>>> nodes have a lot of unused space.
>>>
>>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>>> where it is not able to write any further even though the total space
>>> available in the cluster is about 500 TB. We believe this has something to
>>> do with the way it is balancing the nodes, but don't understand the problem
>>> yet. May be the attached PDF will help some of you (experts) to see what is
>>> going wrong here...
>>>
>>> Thanks
>>> ------
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Balancer know about topology,but when calculate balancing it operates
>>> only with nodes not with racks.
>>> You can see how it work in Balancer.java in  BalancerDatanode about
>>> string 509.
>>>
>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>>
>>> For example:
>>> cluster_capacity=3.5Pb
>>> cluster_dfsused=2Pb
>>>
>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>>> .Balancer think that all good if  avgutil
>>> +10>node_utilizazation>=avgutil-10.
>>>
>>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>>> only 6.5Tb and for 72Tb its about 40Tb.
>>>
>>> Balancer cant help you.
>>>
>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>>
>>>
>>>
>>>>
>>>>
>>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>>> you will be able to have only 12Tb replication data.
>>>>
>>>>
>>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and
>>>> 72 TB, but not true for more than two nodes in the cluster.
>>>>
>>>>
>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>>>> with identical capacity.Racks must be identical capacity.
>>>> For example:
>>>>
>>>> rack1: 1 node with 72Tb
>>>> rack2: 6 nodes with 12Tb
>>>> rack3: 3 nodes with 24Tb
>>>>
>>>> It helps with balancing,because dublicated  block must be another rack.
>>>>
>>>>
>>>> The same question I asked earlier in this message, does multiple racks
>>>> with default threshold for the balancer minimizes the difference between
>>>> racks ?
>>>>
>>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>>> choise.
>>>>
>>>>
>>>> It wasn't my decision, and I probably can't change it now. I am new to
>>>> this cluster and trying to understand few issues. I will explore other
>>>> options as you mentioned.
>>>>
>>>> --
>>>> http://balajin.net/blog
>>>> http://flic.kr/balajijegan
>>>>
>>>
>>
>
>
> --
> http://balajin.net/blog
> http://flic.kr/balajijegan
>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 

-------
 

On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:

>  -setBalancerBandwidth <bandwidth in bytes per second>
> 
> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
> 
> 
> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
> What's the unit for "dfs.balance.bandwidthPerSec" ?
> 
> -----
> 
> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
> 
>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>> 
>> 
>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>> 
>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>> 
>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>> 
>> Thanks
>> ------
>> 
>> 
>> 
>> 
>> 
>> 
>>> 
>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>> 
>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>> 
>>> For example:
>>> cluster_capacity=3.5Pb
>>> cluster_dfsused=2Pb
>>> 
>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>> 
>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>> 
>>> Balancer cant help you.
>>> 
>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>> 
>>>  
>>> 
>>> 
>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>> 
>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>> 
>>>> 
>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>> For example:
>>>> 
>>>> rack1: 1 node with 72Tb
>>>> rack2: 6 nodes with 12Tb
>>>> rack3: 3 nodes with 24Tb
>>>> 
>>>> It helps with balancing,because dublicated  block must be another rack.
>>>> 
>>> 
>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>> 
>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>> 
>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>> 
>>> -- 
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
> 
> 
> 
> 
> -- 
> http://balajin.net/blog
> http://flic.kr/balajijegan


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 

-------
 

On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:

>  -setBalancerBandwidth <bandwidth in bytes per second>
> 
> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
> 
> 
> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
> What's the unit for "dfs.balance.bandwidthPerSec" ?
> 
> -----
> 
> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
> 
>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>> 
>> 
>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>> 
>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>> 
>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>> 
>> Thanks
>> ------
>> 
>> 
>> 
>> 
>> 
>> 
>>> 
>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>> 
>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>> 
>>> For example:
>>> cluster_capacity=3.5Pb
>>> cluster_dfsused=2Pb
>>> 
>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>> 
>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>> 
>>> Balancer cant help you.
>>> 
>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>> 
>>>  
>>> 
>>> 
>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>> 
>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>> 
>>>> 
>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>> For example:
>>>> 
>>>> rack1: 1 node with 72Tb
>>>> rack2: 6 nodes with 12Tb
>>>> rack3: 3 nodes with 24Tb
>>>> 
>>>> It helps with balancing,because dublicated  block must be another rack.
>>>> 
>>> 
>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>> 
>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>> 
>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>> 
>>> -- 
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
> 
> 
> 
> 
> -- 
> http://balajin.net/blog
> http://flic.kr/balajijegan


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 

-------
 

On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:

>  -setBalancerBandwidth <bandwidth in bytes per second>
> 
> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
> 
> 
> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
> What's the unit for "dfs.balance.bandwidthPerSec" ?
> 
> -----
> 
> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
> 
>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>> 
>> 
>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>> 
>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>> 
>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>> 
>> Thanks
>> ------
>> 
>> 
>> 
>> 
>> 
>> 
>>> 
>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>> 
>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>> 
>>> For example:
>>> cluster_capacity=3.5Pb
>>> cluster_dfsused=2Pb
>>> 
>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>> 
>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>> 
>>> Balancer cant help you.
>>> 
>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>> 
>>>  
>>> 
>>> 
>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>> 
>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>> 
>>>> 
>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>> For example:
>>>> 
>>>> rack1: 1 node with 72Tb
>>>> rack2: 6 nodes with 12Tb
>>>> rack3: 3 nodes with 24Tb
>>>> 
>>>> It helps with balancing,because dublicated  block must be another rack.
>>>> 
>>> 
>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>> 
>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>> 
>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>> 
>>> -- 
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
> 
> 
> 
> 
> -- 
> http://balajin.net/blog
> http://flic.kr/balajijegan


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Yes, thanks for pointing, but I already know that it is completing the balancing when exiting otherwise it shouldn't exit. 
Your answer doesn't solve the problem I mentioned earlier in my message. 'hdfs' is stalling and hadoop is not writing unless space is cleared up from the cluster even though "df" shows the cluster has about 500 TB of free space. 

-------
 

On Mar 24, 2013, at 1:54 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <ba...@balajin.net> wrote:

>  -setBalancerBandwidth <bandwidth in bytes per second>
> 
> So the value is bytes per second. If it is running and exiting,it means it has completed the balancing. 
> 
> 
> On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:
> Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
> What's the unit for "dfs.balance.bandwidthPerSec" ?
> 
> -----
> 
> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:
> 
>> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
>> 
>> 
>> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
>> 
>> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
>> 
>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
>> 
>> Thanks
>> ------
>> 
>> 
>> 
>> 
>> 
>> 
>>> 
>>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>>> 
>>> I was wrong about 350Tb,35Tb it calculates in such way :
>>> 
>>> For example:
>>> cluster_capacity=3.5Pb
>>> cluster_dfsused=2Pb
>>> 
>>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>>> 
>>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>>> 
>>> Balancer cant help you.
>>> 
>>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>>> 
>>>  
>>> 
>>> 
>>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>>> 
>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>>> 
>>>> 
>>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>>> For example:
>>>> 
>>>> rack1: 1 node with 72Tb
>>>> rack2: 6 nodes with 12Tb
>>>> rack3: 3 nodes with 24Tb
>>>> 
>>>> It helps with balancing,because dublicated  block must be another rack.
>>>> 
>>> 
>>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>>> 
>>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>>> 
>>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>>> 
>>> -- 
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
> 
> 
> 
> 
> -- 
> http://balajin.net/blog
> http://flic.kr/balajijegan


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by "Balaji Narayanan (பாலாஜி நாராயணன்)" <ba...@balajin.net>.
 -setBalancerBandwidth <bandwidth in bytes per second>

So the value is bytes per second. If it is running and exiting,it means it
has completed the balancing.


On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:

> Yes, we are running balancer, though a balancer process runs for almost a
> day or more before exiting and starting over.
> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
> is in Bits then we have a problem.
> What's the unit for "dfs.balance.bandwidthPerSec" ?
>
> -----
>
> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
> lists@balajin.net> wrote:
>
> Are you running balancer? If balancer is running and if it is slow, try
> increasing the balancer bandwidth
>
>
> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>
>> Thanks for the follow up. I don't know whether attachment will pass
>> through this mailing list, but I am attaching a pdf that contains the usage
>> of all live nodes.
>>
>> All nodes starting with letter "g" are the ones with smaller storage
>> space where as nodes starting with letter "s" have larger storage space. As
>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>> nodes have a lot of unused space.
>>
>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>> where it is not able to write any further even though the total space
>> available in the cluster is about 500 TB. We believe this has something to
>> do with the way it is balancing the nodes, but don't understand the problem
>> yet. May be the attached PDF will help some of you (experts) to see what is
>> going wrong here...
>>
>> Thanks
>> ------
>>
>>
>>
>>
>>
>>
>>
>> Balancer know about topology,but when calculate balancing it operates
>> only with nodes not with racks.
>> You can see how it work in Balancer.java in  BalancerDatanode about
>> string 509.
>>
>> I was wrong about 350Tb,35Tb it calculates in such way :
>>
>> For example:
>> cluster_capacity=3.5Pb
>> cluster_dfsused=2Pb
>>
>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>> .Balancer think that all good if  avgutil
>> +10>node_utilizazation>=avgutil-10.
>>
>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>> only 6.5Tb and for 72Tb its about 40Tb.
>>
>> Balancer cant help you.
>>
>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>
>>
>>
>>>
>>>
>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>> you will be able to have only 12Tb replication data.
>>>
>>>
>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
>>> TB, but not true for more than two nodes in the cluster.
>>>
>>>
>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>>> with identical capacity.Racks must be identical capacity.
>>> For example:
>>>
>>> rack1: 1 node with 72Tb
>>> rack2: 6 nodes with 12Tb
>>> rack3: 3 nodes with 24Tb
>>>
>>> It helps with balancing,because dublicated  block must be another rack.
>>>
>>>
>>> The same question I asked earlier in this message, does multiple racks
>>> with default threshold for the balancer minimizes the difference between
>>> racks ?
>>>
>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>> choise.
>>>
>>>
>>> It wasn't my decision, and I probably can't change it now. I am new to
>>> this cluster and trying to understand few issues. I will explore other
>>> options as you mentioned.
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>
>


-- 
http://balajin.net/blog
http://flic.kr/balajijegan

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by "Balaji Narayanan (பாலாஜி நாராயணன்)" <ba...@balajin.net>.
 -setBalancerBandwidth <bandwidth in bytes per second>

So the value is bytes per second. If it is running and exiting,it means it
has completed the balancing.


On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:

> Yes, we are running balancer, though a balancer process runs for almost a
> day or more before exiting and starting over.
> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
> is in Bits then we have a problem.
> What's the unit for "dfs.balance.bandwidthPerSec" ?
>
> -----
>
> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
> lists@balajin.net> wrote:
>
> Are you running balancer? If balancer is running and if it is slow, try
> increasing the balancer bandwidth
>
>
> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>
>> Thanks for the follow up. I don't know whether attachment will pass
>> through this mailing list, but I am attaching a pdf that contains the usage
>> of all live nodes.
>>
>> All nodes starting with letter "g" are the ones with smaller storage
>> space where as nodes starting with letter "s" have larger storage space. As
>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>> nodes have a lot of unused space.
>>
>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>> where it is not able to write any further even though the total space
>> available in the cluster is about 500 TB. We believe this has something to
>> do with the way it is balancing the nodes, but don't understand the problem
>> yet. May be the attached PDF will help some of you (experts) to see what is
>> going wrong here...
>>
>> Thanks
>> ------
>>
>>
>>
>>
>>
>>
>>
>> Balancer know about topology,but when calculate balancing it operates
>> only with nodes not with racks.
>> You can see how it work in Balancer.java in  BalancerDatanode about
>> string 509.
>>
>> I was wrong about 350Tb,35Tb it calculates in such way :
>>
>> For example:
>> cluster_capacity=3.5Pb
>> cluster_dfsused=2Pb
>>
>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>> .Balancer think that all good if  avgutil
>> +10>node_utilizazation>=avgutil-10.
>>
>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>> only 6.5Tb and for 72Tb its about 40Tb.
>>
>> Balancer cant help you.
>>
>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>
>>
>>
>>>
>>>
>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>> you will be able to have only 12Tb replication data.
>>>
>>>
>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
>>> TB, but not true for more than two nodes in the cluster.
>>>
>>>
>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>>> with identical capacity.Racks must be identical capacity.
>>> For example:
>>>
>>> rack1: 1 node with 72Tb
>>> rack2: 6 nodes with 12Tb
>>> rack3: 3 nodes with 24Tb
>>>
>>> It helps with balancing,because dublicated  block must be another rack.
>>>
>>>
>>> The same question I asked earlier in this message, does multiple racks
>>> with default threshold for the balancer minimizes the difference between
>>> racks ?
>>>
>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>> choise.
>>>
>>>
>>> It wasn't my decision, and I probably can't change it now. I am new to
>>> this cluster and trying to understand few issues. I will explore other
>>> options as you mentioned.
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>
>


-- 
http://balajin.net/blog
http://flic.kr/balajijegan

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by "Balaji Narayanan (பாலாஜி நாராயணன்)" <ba...@balajin.net>.
 -setBalancerBandwidth <bandwidth in bytes per second>

So the value is bytes per second. If it is running and exiting,it means it
has completed the balancing.


On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:

> Yes, we are running balancer, though a balancer process runs for almost a
> day or more before exiting and starting over.
> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
> is in Bits then we have a problem.
> What's the unit for "dfs.balance.bandwidthPerSec" ?
>
> -----
>
> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
> lists@balajin.net> wrote:
>
> Are you running balancer? If balancer is running and if it is slow, try
> increasing the balancer bandwidth
>
>
> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>
>> Thanks for the follow up. I don't know whether attachment will pass
>> through this mailing list, but I am attaching a pdf that contains the usage
>> of all live nodes.
>>
>> All nodes starting with letter "g" are the ones with smaller storage
>> space where as nodes starting with letter "s" have larger storage space. As
>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>> nodes have a lot of unused space.
>>
>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>> where it is not able to write any further even though the total space
>> available in the cluster is about 500 TB. We believe this has something to
>> do with the way it is balancing the nodes, but don't understand the problem
>> yet. May be the attached PDF will help some of you (experts) to see what is
>> going wrong here...
>>
>> Thanks
>> ------
>>
>>
>>
>>
>>
>>
>>
>> Balancer know about topology,but when calculate balancing it operates
>> only with nodes not with racks.
>> You can see how it work in Balancer.java in  BalancerDatanode about
>> string 509.
>>
>> I was wrong about 350Tb,35Tb it calculates in such way :
>>
>> For example:
>> cluster_capacity=3.5Pb
>> cluster_dfsused=2Pb
>>
>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>> .Balancer think that all good if  avgutil
>> +10>node_utilizazation>=avgutil-10.
>>
>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>> only 6.5Tb and for 72Tb its about 40Tb.
>>
>> Balancer cant help you.
>>
>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>
>>
>>
>>>
>>>
>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>> you will be able to have only 12Tb replication data.
>>>
>>>
>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
>>> TB, but not true for more than two nodes in the cluster.
>>>
>>>
>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>>> with identical capacity.Racks must be identical capacity.
>>> For example:
>>>
>>> rack1: 1 node with 72Tb
>>> rack2: 6 nodes with 12Tb
>>> rack3: 3 nodes with 24Tb
>>>
>>> It helps with balancing,because dublicated  block must be another rack.
>>>
>>>
>>> The same question I asked earlier in this message, does multiple racks
>>> with default threshold for the balancer minimizes the difference between
>>> racks ?
>>>
>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>> choise.
>>>
>>>
>>> It wasn't my decision, and I probably can't change it now. I am new to
>>> this cluster and trying to understand few issues. I will explore other
>>> options as you mentioned.
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>
>


-- 
http://balajin.net/blog
http://flic.kr/balajijegan

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by "Balaji Narayanan (பாலாஜி நாராயணன்)" <ba...@balajin.net>.
 -setBalancerBandwidth <bandwidth in bytes per second>

So the value is bytes per second. If it is running and exiting,it means it
has completed the balancing.


On 24 March 2013 11:32, Tapas Sarangi <ta...@gmail.com> wrote:

> Yes, we are running balancer, though a balancer process runs for almost a
> day or more before exiting and starting over.
> Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume
> that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it
> is in Bits then we have a problem.
> What's the unit for "dfs.balance.bandwidthPerSec" ?
>
> -----
>
> On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <
> lists@balajin.net> wrote:
>
> Are you running balancer? If balancer is running and if it is slow, try
> increasing the balancer bandwidth
>
>
> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
>
>> Thanks for the follow up. I don't know whether attachment will pass
>> through this mailing list, but I am attaching a pdf that contains the usage
>> of all live nodes.
>>
>> All nodes starting with letter "g" are the ones with smaller storage
>> space where as nodes starting with letter "s" have larger storage space. As
>> you will see, most of the "gXX" nodes are completely full whereas "sXX"
>> nodes have a lot of unused space.
>>
>> Recently, we are facing crisis frequently as 'hdfs' goes into a mode
>> where it is not able to write any further even though the total space
>> available in the cluster is about 500 TB. We believe this has something to
>> do with the way it is balancing the nodes, but don't understand the problem
>> yet. May be the attached PDF will help some of you (experts) to see what is
>> going wrong here...
>>
>> Thanks
>> ------
>>
>>
>>
>>
>>
>>
>>
>> Balancer know about topology,but when calculate balancing it operates
>> only with nodes not with racks.
>> You can see how it work in Balancer.java in  BalancerDatanode about
>> string 509.
>>
>> I was wrong about 350Tb,35Tb it calculates in such way :
>>
>> For example:
>> cluster_capacity=3.5Pb
>> cluster_dfsused=2Pb
>>
>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>> Then we know avg node utilization (node_dfsused/node_capacity*100)
>> .Balancer think that all good if  avgutil
>> +10>node_utilizazation>=avgutil-10.
>>
>> Ideal case that all node used avgutl of capacity.but for 12TB node its
>> only 6.5Tb and for 72Tb its about 40Tb.
>>
>> Balancer cant help you.
>>
>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>>
>>
>>
>>>
>>>
>>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>>> you will be able to have only 12Tb replication data.
>>>
>>>
>>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
>>> TB, but not true for more than two nodes in the cluster.
>>>
>>>
>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>>> with identical capacity.Racks must be identical capacity.
>>> For example:
>>>
>>> rack1: 1 node with 72Tb
>>> rack2: 6 nodes with 12Tb
>>> rack3: 3 nodes with 24Tb
>>>
>>> It helps with balancing,because dublicated  block must be another rack.
>>>
>>>
>>> The same question I asked earlier in this message, does multiple racks
>>> with default threshold for the balancer minimizes the difference between
>>> racks ?
>>>
>>> Why did you select hdfs?May be lustre,cephfs and other is better
>>> choise.
>>>
>>>
>>> It wasn't my decision, and I probably can't change it now. I am new to
>>> this cluster and trying to understand few issues. I will explore other
>>> options as you mentioned.
>>>
>>> --
>>> http://balajin.net/blog
>>> http://flic.kr/balajijegan
>>>
>>
>


-- 
http://balajin.net/blog
http://flic.kr/balajijegan

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
What's the unit for "dfs.balance.bandwidthPerSec" ?

-----

On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:

> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
> 
> 
> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
> 
> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
> 
> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
> 
> Thanks
> ------
> 
> 
> 
> 
> 
> 
>> 
>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>> 
>> I was wrong about 350Tb,35Tb it calculates in such way :
>> 
>> For example:
>> cluster_capacity=3.5Pb
>> cluster_dfsused=2Pb
>> 
>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>> 
>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>> 
>> Balancer cant help you.
>> 
>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>> 
>>  
>> 
>> 
>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>> 
>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>> 
>>> 
>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>> For example:
>>> 
>>> rack1: 1 node with 72Tb
>>> rack2: 6 nodes with 12Tb
>>> rack3: 3 nodes with 24Tb
>>> 
>>> It helps with balancing,because dublicated  block must be another rack.
>>> 
>> 
>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>> 
>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>> 
>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>> 
>> -- 
>> http://balajin.net/blog
>> http://flic.kr/balajijegan


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
What's the unit for "dfs.balance.bandwidthPerSec" ?

-----

On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:

> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
> 
> 
> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
> 
> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
> 
> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
> 
> Thanks
> ------
> 
> 
> 
> 
> 
> 
>> 
>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>> 
>> I was wrong about 350Tb,35Tb it calculates in such way :
>> 
>> For example:
>> cluster_capacity=3.5Pb
>> cluster_dfsused=2Pb
>> 
>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>> 
>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>> 
>> Balancer cant help you.
>> 
>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>> 
>>  
>> 
>> 
>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>> 
>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>> 
>>> 
>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>> For example:
>>> 
>>> rack1: 1 node with 72Tb
>>> rack2: 6 nodes with 12Tb
>>> rack3: 3 nodes with 24Tb
>>> 
>>> It helps with balancing,because dublicated  block must be another rack.
>>> 
>> 
>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>> 
>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>> 
>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>> 
>> -- 
>> http://balajin.net/blog
>> http://flic.kr/balajijegan


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
What's the unit for "dfs.balance.bandwidthPerSec" ?

-----

On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:

> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
> 
> 
> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
> 
> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
> 
> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
> 
> Thanks
> ------
> 
> 
> 
> 
> 
> 
>> 
>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>> 
>> I was wrong about 350Tb,35Tb it calculates in such way :
>> 
>> For example:
>> cluster_capacity=3.5Pb
>> cluster_dfsused=2Pb
>> 
>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>> 
>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>> 
>> Balancer cant help you.
>> 
>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>> 
>>  
>> 
>> 
>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>> 
>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>> 
>>> 
>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>> For example:
>>> 
>>> rack1: 1 node with 72Tb
>>> rack2: 6 nodes with 12Tb
>>> rack3: 3 nodes with 24Tb
>>> 
>>> It helps with balancing,because dublicated  block must be another rack.
>>> 
>> 
>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>> 
>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>> 
>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>> 
>> -- 
>> http://balajin.net/blog
>> http://flic.kr/balajijegan


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Yes, we are running balancer, though a balancer process runs for almost a day or more before exiting and starting over.
Current dfs.balance.bandwidthPerSec value is set to 2x10^9. I assume that's bytes so about 2 GigaByte/sec. Shouldn't that be reasonable ? If it is in Bits then we have a problem.
What's the unit for "dfs.balance.bandwidthPerSec" ?

-----

On Mar 24, 2013, at 1:23 PM, Balaji Narayanan (பாலாஜி நாராயணன்) <li...@balajin.net> wrote:

> Are you running balancer? If balancer is running and if it is slow, try increasing the balancer bandwidth
> 
> 
> On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:
> Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.
> 
> All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 
> 
> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...
> 
> Thanks
> ------
> 
> 
> 
> 
> 
> 
>> 
>> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
>> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
>> 
>> I was wrong about 350Tb,35Tb it calculates in such way :
>> 
>> For example:
>> cluster_capacity=3.5Pb
>> cluster_dfsused=2Pb
>> 
>> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
>> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
>> 
>> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
>> 
>> Balancer cant help you.
>> 
>> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
>> 
>>  
>> 
>> 
>>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
>> 
>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
>> 
>>> 
>>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>>> For example:
>>> 
>>> rack1: 1 node with 72Tb
>>> rack2: 6 nodes with 12Tb
>>> rack3: 3 nodes with 24Tb
>>> 
>>> It helps with balancing,because dublicated  block must be another rack.
>>> 
>> 
>> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
>> 
>>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
>> 
>> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
>> 
>> -- 
>> http://balajin.net/blog
>> http://flic.kr/balajijegan


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by "Balaji Narayanan (பாலாஜி நாராயணன்)" <li...@balajin.net>.
Are you running balancer? If balancer is running and if it is slow, try
increasing the balancer bandwidth


On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:

> Thanks for the follow up. I don't know whether attachment will pass
> through this mailing list, but I am attaching a pdf that contains the usage
> of all live nodes.
>
> All nodes starting with letter "g" are the ones with smaller storage space
> where as nodes starting with letter "s" have larger storage space. As you
> will see, most of the "gXX" nodes are completely full whereas "sXX" nodes
> have a lot of unused space.
>
> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where
> it is not able to write any further even though the total space available
> in the cluster is about 500 TB. We believe this has something to do with
> the way it is balancing the nodes, but don't understand the problem yet.
> May be the attached PDF will help some of you (experts) to see what is
> going wrong here...
>
> Thanks
> ------
>
>
>
>
>
>
>
> Balancer know about topology,but when calculate balancing it operates only
> with nodes not with racks.
> You can see how it work in Balancer.java in  BalancerDatanode about string
> 509.
>
> I was wrong about 350Tb,35Tb it calculates in such way :
>
> For example:
> cluster_capacity=3.5Pb
> cluster_dfsused=2Pb
>
> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
> Then we know avg node utilization (node_dfsused/node_capacity*100)
> .Balancer think that all good if  avgutil
> +10>node_utilizazation>=avgutil-10.
>
> Ideal case that all node used avgutl of capacity.but for 12TB node its
> only 6.5Tb and for 72Tb its about 40Tb.
>
> Balancer cant help you.
>
> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>
>
>
>>
>>
>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>> you will be able to have only 12Tb replication data.
>>
>>
>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
>> TB, but not true for more than two nodes in the cluster.
>>
>>
>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>> with identical capacity.Racks must be identical capacity.
>> For example:
>>
>> rack1: 1 node with 72Tb
>> rack2: 6 nodes with 12Tb
>> rack3: 3 nodes with 24Tb
>>
>> It helps with balancing,because dublicated  block must be another rack.
>>
>>
>> The same question I asked earlier in this message, does multiple racks
>> with default threshold for the balancer minimizes the difference between
>> racks ?
>>
>> Why did you select hdfs?May be lustre,cephfs and other is better choise.
>>
>>
>> It wasn't my decision, and I probably can't change it now. I am new to
>> this cluster and trying to understand few issues. I will explore other
>> options as you mentioned.
>>
>> --
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
>>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by "Balaji Narayanan (பாலாஜி நாராயணன்)" <li...@balajin.net>.
Are you running balancer? If balancer is running and if it is slow, try
increasing the balancer bandwidth


On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:

> Thanks for the follow up. I don't know whether attachment will pass
> through this mailing list, but I am attaching a pdf that contains the usage
> of all live nodes.
>
> All nodes starting with letter "g" are the ones with smaller storage space
> where as nodes starting with letter "s" have larger storage space. As you
> will see, most of the "gXX" nodes are completely full whereas "sXX" nodes
> have a lot of unused space.
>
> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where
> it is not able to write any further even though the total space available
> in the cluster is about 500 TB. We believe this has something to do with
> the way it is balancing the nodes, but don't understand the problem yet.
> May be the attached PDF will help some of you (experts) to see what is
> going wrong here...
>
> Thanks
> ------
>
>
>
>
>
>
>
> Balancer know about topology,but when calculate balancing it operates only
> with nodes not with racks.
> You can see how it work in Balancer.java in  BalancerDatanode about string
> 509.
>
> I was wrong about 350Tb,35Tb it calculates in such way :
>
> For example:
> cluster_capacity=3.5Pb
> cluster_dfsused=2Pb
>
> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
> Then we know avg node utilization (node_dfsused/node_capacity*100)
> .Balancer think that all good if  avgutil
> +10>node_utilizazation>=avgutil-10.
>
> Ideal case that all node used avgutl of capacity.but for 12TB node its
> only 6.5Tb and for 72Tb its about 40Tb.
>
> Balancer cant help you.
>
> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>
>
>
>>
>>
>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>> you will be able to have only 12Tb replication data.
>>
>>
>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
>> TB, but not true for more than two nodes in the cluster.
>>
>>
>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>> with identical capacity.Racks must be identical capacity.
>> For example:
>>
>> rack1: 1 node with 72Tb
>> rack2: 6 nodes with 12Tb
>> rack3: 3 nodes with 24Tb
>>
>> It helps with balancing,because dublicated  block must be another rack.
>>
>>
>> The same question I asked earlier in this message, does multiple racks
>> with default threshold for the balancer minimizes the difference between
>> racks ?
>>
>> Why did you select hdfs?May be lustre,cephfs and other is better choise.
>>
>>
>> It wasn't my decision, and I probably can't change it now. I am new to
>> this cluster and trying to understand few issues. I will explore other
>> options as you mentioned.
>>
>> --
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
>>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by "Balaji Narayanan (பாலாஜி நாராயணன்)" <li...@balajin.net>.
Are you running balancer? If balancer is running and if it is slow, try
increasing the balancer bandwidth


On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:

> Thanks for the follow up. I don't know whether attachment will pass
> through this mailing list, but I am attaching a pdf that contains the usage
> of all live nodes.
>
> All nodes starting with letter "g" are the ones with smaller storage space
> where as nodes starting with letter "s" have larger storage space. As you
> will see, most of the "gXX" nodes are completely full whereas "sXX" nodes
> have a lot of unused space.
>
> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where
> it is not able to write any further even though the total space available
> in the cluster is about 500 TB. We believe this has something to do with
> the way it is balancing the nodes, but don't understand the problem yet.
> May be the attached PDF will help some of you (experts) to see what is
> going wrong here...
>
> Thanks
> ------
>
>
>
>
>
>
>
> Balancer know about topology,but when calculate balancing it operates only
> with nodes not with racks.
> You can see how it work in Balancer.java in  BalancerDatanode about string
> 509.
>
> I was wrong about 350Tb,35Tb it calculates in such way :
>
> For example:
> cluster_capacity=3.5Pb
> cluster_dfsused=2Pb
>
> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
> Then we know avg node utilization (node_dfsused/node_capacity*100)
> .Balancer think that all good if  avgutil
> +10>node_utilizazation>=avgutil-10.
>
> Ideal case that all node used avgutl of capacity.but for 12TB node its
> only 6.5Tb and for 72Tb its about 40Tb.
>
> Balancer cant help you.
>
> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>
>
>
>>
>>
>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>> you will be able to have only 12Tb replication data.
>>
>>
>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
>> TB, but not true for more than two nodes in the cluster.
>>
>>
>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>> with identical capacity.Racks must be identical capacity.
>> For example:
>>
>> rack1: 1 node with 72Tb
>> rack2: 6 nodes with 12Tb
>> rack3: 3 nodes with 24Tb
>>
>> It helps with balancing,because dublicated  block must be another rack.
>>
>>
>> The same question I asked earlier in this message, does multiple racks
>> with default threshold for the balancer minimizes the difference between
>> racks ?
>>
>> Why did you select hdfs?May be lustre,cephfs and other is better choise.
>>
>>
>> It wasn't my decision, and I probably can't change it now. I am new to
>> this cluster and trying to understand few issues. I will explore other
>> options as you mentioned.
>>
>> --
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
>>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by "Balaji Narayanan (பாலாஜி நாராயணன்)" <li...@balajin.net>.
Are you running balancer? If balancer is running and if it is slow, try
increasing the balancer bandwidth


On 24 March 2013 09:21, Tapas Sarangi <ta...@gmail.com> wrote:

> Thanks for the follow up. I don't know whether attachment will pass
> through this mailing list, but I am attaching a pdf that contains the usage
> of all live nodes.
>
> All nodes starting with letter "g" are the ones with smaller storage space
> where as nodes starting with letter "s" have larger storage space. As you
> will see, most of the "gXX" nodes are completely full whereas "sXX" nodes
> have a lot of unused space.
>
> Recently, we are facing crisis frequently as 'hdfs' goes into a mode where
> it is not able to write any further even though the total space available
> in the cluster is about 500 TB. We believe this has something to do with
> the way it is balancing the nodes, but don't understand the problem yet.
> May be the attached PDF will help some of you (experts) to see what is
> going wrong here...
>
> Thanks
> ------
>
>
>
>
>
>
>
> Balancer know about topology,but when calculate balancing it operates only
> with nodes not with racks.
> You can see how it work in Balancer.java in  BalancerDatanode about string
> 509.
>
> I was wrong about 350Tb,35Tb it calculates in such way :
>
> For example:
> cluster_capacity=3.5Pb
> cluster_dfsused=2Pb
>
> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
> Then we know avg node utilization (node_dfsused/node_capacity*100)
> .Balancer think that all good if  avgutil
> +10>node_utilizazation>=avgutil-10.
>
> Ideal case that all node used avgutl of capacity.but for 12TB node its
> only 6.5Tb and for 72Tb its about 40Tb.
>
> Balancer cant help you.
>
> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVEif you can.
>
>
>
>>
>>
>>  In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb
>> you will be able to have only 12Tb replication data.
>>
>>
>> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
>> TB, but not true for more than two nodes in the cluster.
>>
>>
>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
>> with identical capacity.Racks must be identical capacity.
>> For example:
>>
>> rack1: 1 node with 72Tb
>> rack2: 6 nodes with 12Tb
>> rack3: 3 nodes with 24Tb
>>
>> It helps with balancing,because dublicated  block must be another rack.
>>
>>
>> The same question I asked earlier in this message, does multiple racks
>> with default threshold for the balancer minimizes the difference between
>> racks ?
>>
>> Why did you select hdfs?May be lustre,cephfs and other is better choise.
>>
>>
>> It wasn't my decision, and I probably can't change it now. I am new to
>> this cluster and trying to understand few issues. I will explore other
>> options as you mentioned.
>>
>> --
>> http://balajin.net/blog
>> http://flic.kr/balajijegan
>>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.

All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 

Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...

Thanks
------






> 
> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
> 
> I was wrong about 350Tb,35Tb it calculates in such way :
> 
> For example:
> cluster_capacity=3.5Pb
> cluster_dfsused=2Pb
> 
> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
> 
> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
> 
> Balancer cant help you.
> 
> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
> 
>  
> 
> 
>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
> 
> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
> 
>> 
>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>> For example:
>> 
>> rack1: 1 node with 72Tb
>> rack2: 6 nodes with 12Tb
>> rack3: 3 nodes with 24Tb
>> 
>> It helps with balancing,because dublicated  block must be another rack.
>> 
> 
> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
> 
>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
> 
> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
> 
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.

All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 

Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...

Thanks
------






> 
> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
> 
> I was wrong about 350Tb,35Tb it calculates in such way :
> 
> For example:
> cluster_capacity=3.5Pb
> cluster_dfsused=2Pb
> 
> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
> 
> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
> 
> Balancer cant help you.
> 
> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
> 
>  
> 
> 
>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
> 
> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
> 
>> 
>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>> For example:
>> 
>> rack1: 1 node with 72Tb
>> rack2: 6 nodes with 12Tb
>> rack3: 3 nodes with 24Tb
>> 
>> It helps with balancing,because dublicated  block must be another rack.
>> 
> 
> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
> 
>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
> 
> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
> 
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.

All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 

Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...

Thanks
------






> 
> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
> 
> I was wrong about 350Tb,35Tb it calculates in such way :
> 
> For example:
> cluster_capacity=3.5Pb
> cluster_dfsused=2Pb
> 
> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
> 
> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
> 
> Balancer cant help you.
> 
> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
> 
>  
> 
> 
>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
> 
> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
> 
>> 
>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>> For example:
>> 
>> rack1: 1 node with 72Tb
>> rack2: 6 nodes with 12Tb
>> rack3: 3 nodes with 24Tb
>> 
>> It helps with balancing,because dublicated  block must be another rack.
>> 
> 
> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
> 
>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
> 
> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
> 
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks for the follow up. I don't know whether attachment will pass through this mailing list, but I am attaching a pdf that contains the usage of all live nodes.

All nodes starting with letter "g" are the ones with smaller storage space where as nodes starting with letter "s" have larger storage space. As you will see, most of the "gXX" nodes are completely full whereas "sXX" nodes have a lot of unused space. 

Recently, we are facing crisis frequently as 'hdfs' goes into a mode where it is not able to write any further even though the total space available in the cluster is about 500 TB. We believe this has something to do with the way it is balancing the nodes, but don't understand the problem yet. May be the attached PDF will help some of you (experts) to see what is going wrong here...

Thanks
------






> 
> Balancer know about topology,but when calculate balancing it operates only with nodes not with racks.
> You can see how it work in Balancer.java in  BalancerDatanode about string 509.
> 
> I was wrong about 350Tb,35Tb it calculates in such way :
> 
> For example:
> cluster_capacity=3.5Pb
> cluster_dfsused=2Pb
> 
> avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
> Then we know avg node utilization (node_dfsused/node_capacity*100) .Balancer think that all good if  avgutil +10>node_utilizazation>=avgutil-10.
> 
> Ideal case that all node used avgutl of capacity.but for 12TB node its only 6.5Tb and for 72Tb its about 40Tb.
> 
> Balancer cant help you.
> 
> Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if you can.
> 
>  
> 
> 
>> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.
> 
> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.
> 
>> 
>> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
>> For example:
>> 
>> rack1: 1 node with 72Tb
>> rack2: 6 nodes with 12Tb
>> rack3: 3 nodes with 24Tb
>> 
>> It helps with balancing,because dublicated  block must be another rack.
>> 
> 
> The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?
> 
>> Why did you select hdfs?May be lustre,cephfs and other is better choise.  
> 
> It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.
> 
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Алексей Бабутин <zo...@gmail.com>.
2013/3/20 Tapas Sarangi <ta...@gmail.com>

> Thanks for your reply. Some follow up questions below :
>
> On Mar 20, 2013, at 5:35 AM, Алексей Бабутин <zo...@gmail.com>
> wrote:
>
>
>
> dfs.balance.bandwidthPerSec in hdfs-site.xml.I think balancer cant help
> you,because it makes all the nodes equal.They can differ only on balancer
> threshold.Threshold =10 by default.It means,that nodes can differ up to
> 350Tb between each other in 3.5Pb cluster.If Threshold =1 up to 35Tb and so
> on.
>
>
> If we use multiple racks, let's assume we have 10 racks now and they are
> equally divided in size (350 TB each). With a default threshold of 10, any
> two nodes on a given rack will have a maximum difference of 35 TB, is this
> correct ? Also, does this mean the difference between any two racks will
> also go down to 35 TB ?
>

Balancer know about topology,but when calculate balancing it operates only
with nodes not with racks.
You can see how it work in Balancer.java in  BalancerDatanode about string
509.

I was wrong about 350Tb,35Tb it calculates in such way :

For example:
cluster_capacity=3.5Pb
cluster_dfsused=2Pb

avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
Then we know avg node utilization (node_dfsused/node_capacity*100)
.Balancer think that all good if  avgutil
+10>node_utilizazation>=avgutil-10.

Ideal case that all node used avgutl of capacity.but for 12TB node its only
6.5Tb and for 72Tb its about 40Tb.

Balancer cant help you.

Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
you can.



>
>
> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you
> will be able to have only 12Tb replication data.
>
>
> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
> TB, but not true for more than two nodes in the cluster.
>
>
> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
> with identical capacity.Racks must be identical capacity.
> For example:
>
> rack1: 1 node with 72Tb
> rack2: 6 nodes with 12Tb
> rack3: 3 nodes with 24Tb
>
> It helps with balancing,because dublicated  block must be another rack.
>
>
> The same question I asked earlier in this message, does multiple racks
> with default threshold for the balancer minimizes the difference between
> racks ?
>
> Why did you select hdfs?May be lustre,cephfs and other is better choise.
>
>
> It wasn't my decision, and I probably can't change it now. I am new to
> this cluster and trying to understand few issues. I will explore other
> options as you mentioned.
>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Алексей Бабутин <zo...@gmail.com>.
2013/3/20 Tapas Sarangi <ta...@gmail.com>

> Thanks for your reply. Some follow up questions below :
>
> On Mar 20, 2013, at 5:35 AM, Алексей Бабутин <zo...@gmail.com>
> wrote:
>
>
>
> dfs.balance.bandwidthPerSec in hdfs-site.xml.I think balancer cant help
> you,because it makes all the nodes equal.They can differ only on balancer
> threshold.Threshold =10 by default.It means,that nodes can differ up to
> 350Tb between each other in 3.5Pb cluster.If Threshold =1 up to 35Tb and so
> on.
>
>
> If we use multiple racks, let's assume we have 10 racks now and they are
> equally divided in size (350 TB each). With a default threshold of 10, any
> two nodes on a given rack will have a maximum difference of 35 TB, is this
> correct ? Also, does this mean the difference between any two racks will
> also go down to 35 TB ?
>

Balancer know about topology,but when calculate balancing it operates only
with nodes not with racks.
You can see how it work in Balancer.java in  BalancerDatanode about string
509.

I was wrong about 350Tb,35Tb it calculates in such way :

For example:
cluster_capacity=3.5Pb
cluster_dfsused=2Pb

avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
Then we know avg node utilization (node_dfsused/node_capacity*100)
.Balancer think that all good if  avgutil
+10>node_utilizazation>=avgutil-10.

Ideal case that all node used avgutl of capacity.but for 12TB node its only
6.5Tb and for 72Tb its about 40Tb.

Balancer cant help you.

Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
you can.



>
>
> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you
> will be able to have only 12Tb replication data.
>
>
> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
> TB, but not true for more than two nodes in the cluster.
>
>
> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
> with identical capacity.Racks must be identical capacity.
> For example:
>
> rack1: 1 node with 72Tb
> rack2: 6 nodes with 12Tb
> rack3: 3 nodes with 24Tb
>
> It helps with balancing,because dublicated  block must be another rack.
>
>
> The same question I asked earlier in this message, does multiple racks
> with default threshold for the balancer minimizes the difference between
> racks ?
>
> Why did you select hdfs?May be lustre,cephfs and other is better choise.
>
>
> It wasn't my decision, and I probably can't change it now. I am new to
> this cluster and trying to understand few issues. I will explore other
> options as you mentioned.
>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Алексей Бабутин <zo...@gmail.com>.
2013/3/20 Tapas Sarangi <ta...@gmail.com>

> Thanks for your reply. Some follow up questions below :
>
> On Mar 20, 2013, at 5:35 AM, Алексей Бабутин <zo...@gmail.com>
> wrote:
>
>
>
> dfs.balance.bandwidthPerSec in hdfs-site.xml.I think balancer cant help
> you,because it makes all the nodes equal.They can differ only on balancer
> threshold.Threshold =10 by default.It means,that nodes can differ up to
> 350Tb between each other in 3.5Pb cluster.If Threshold =1 up to 35Tb and so
> on.
>
>
> If we use multiple racks, let's assume we have 10 racks now and they are
> equally divided in size (350 TB each). With a default threshold of 10, any
> two nodes on a given rack will have a maximum difference of 35 TB, is this
> correct ? Also, does this mean the difference between any two racks will
> also go down to 35 TB ?
>

Balancer know about topology,but when calculate balancing it operates only
with nodes not with racks.
You can see how it work in Balancer.java in  BalancerDatanode about string
509.

I was wrong about 350Tb,35Tb it calculates in such way :

For example:
cluster_capacity=3.5Pb
cluster_dfsused=2Pb

avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
Then we know avg node utilization (node_dfsused/node_capacity*100)
.Balancer think that all good if  avgutil
+10>node_utilizazation>=avgutil-10.

Ideal case that all node used avgutl of capacity.but for 12TB node its only
6.5Tb and for 72Tb its about 40Tb.

Balancer cant help you.

Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
you can.



>
>
> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you
> will be able to have only 12Tb replication data.
>
>
> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
> TB, but not true for more than two nodes in the cluster.
>
>
> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
> with identical capacity.Racks must be identical capacity.
> For example:
>
> rack1: 1 node with 72Tb
> rack2: 6 nodes with 12Tb
> rack3: 3 nodes with 24Tb
>
> It helps with balancing,because dublicated  block must be another rack.
>
>
> The same question I asked earlier in this message, does multiple racks
> with default threshold for the balancer minimizes the difference between
> racks ?
>
> Why did you select hdfs?May be lustre,cephfs and other is better choise.
>
>
> It wasn't my decision, and I probably can't change it now. I am new to
> this cluster and trying to understand few issues. I will explore other
> options as you mentioned.
>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Алексей Бабутин <zo...@gmail.com>.
2013/3/20 Tapas Sarangi <ta...@gmail.com>

> Thanks for your reply. Some follow up questions below :
>
> On Mar 20, 2013, at 5:35 AM, Алексей Бабутин <zo...@gmail.com>
> wrote:
>
>
>
> dfs.balance.bandwidthPerSec in hdfs-site.xml.I think balancer cant help
> you,because it makes all the nodes equal.They can differ only on balancer
> threshold.Threshold =10 by default.It means,that nodes can differ up to
> 350Tb between each other in 3.5Pb cluster.If Threshold =1 up to 35Tb and so
> on.
>
>
> If we use multiple racks, let's assume we have 10 racks now and they are
> equally divided in size (350 TB each). With a default threshold of 10, any
> two nodes on a given rack will have a maximum difference of 35 TB, is this
> correct ? Also, does this mean the difference between any two racks will
> also go down to 35 TB ?
>

Balancer know about topology,but when calculate balancing it operates only
with nodes not with racks.
You can see how it work in Balancer.java in  BalancerDatanode about string
509.

I was wrong about 350Tb,35Tb it calculates in such way :

For example:
cluster_capacity=3.5Pb
cluster_dfsused=2Pb

avgutil=cluster_dfsused/cluster_capacity*100=57.14% used cluster capacity
Then we know avg node utilization (node_dfsused/node_capacity*100)
.Balancer think that all good if  avgutil
+10>node_utilizazation>=avgutil-10.

Ideal case that all node used avgutl of capacity.but for 12TB node its only
6.5Tb and for 72Tb its about 40Tb.

Balancer cant help you.

Show me http://namenode.rambler.ru:50070/dfsnodelist.jsp?whatNodes=LIVE if
you can.



>
>
> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you
> will be able to have only 12Tb replication data.
>
>
> Yes, this is true for exactly two nodes in the cluster with 12 TB and 72
> TB, but not true for more than two nodes in the cluster.
>
>
> Best way,on my opinion,it is using multiple racks.Nodes in rack must be
> with identical capacity.Racks must be identical capacity.
> For example:
>
> rack1: 1 node with 72Tb
> rack2: 6 nodes with 12Tb
> rack3: 3 nodes with 24Tb
>
> It helps with balancing,because dublicated  block must be another rack.
>
>
> The same question I asked earlier in this message, does multiple racks
> with default threshold for the balancer minimizes the difference between
> racks ?
>
> Why did you select hdfs?May be lustre,cephfs and other is better choise.
>
>
> It wasn't my decision, and I probably can't change it now. I am new to
> this cluster and trying to understand few issues. I will explore other
> options as you mentioned.
>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks for your reply. Some follow up questions below :

On Mar 20, 2013, at 5:35 AM, Алексей Бабутин <zo...@gmail.com> wrote:
> 
>  
> dfs.balance.bandwidthPerSec in hdfs-site.xml.I think balancer cant help you,because it makes all the nodes equal.They can differ only on balancer threshold.Threshold =10 by default.It means,that nodes can differ up to 350Tb between each other in 3.5Pb cluster.If Threshold =1 up to 35Tb and so on.

If we use multiple racks, let's assume we have 10 racks now and they are equally divided in size (350 TB each). With a default threshold of 10, any two nodes on a given rack will have a maximum difference of 35 TB, is this correct ? Also, does this mean the difference between any two racks will also go down to 35 TB ?


> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.

Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.

> 
> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
> For example:
> 
> rack1: 1 node with 72Tb
> rack2: 6 nodes with 12Tb
> rack3: 3 nodes with 24Tb
> 
> It helps with balancing,because dublicated  block must be another rack.
> 

The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?

> Why did you select hdfs?May be lustre,cephfs and other is better choise.  

It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.



Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks for your reply. Some follow up questions below :

On Mar 20, 2013, at 5:35 AM, Алексей Бабутин <zo...@gmail.com> wrote:
> 
>  
> dfs.balance.bandwidthPerSec in hdfs-site.xml.I think balancer cant help you,because it makes all the nodes equal.They can differ only on balancer threshold.Threshold =10 by default.It means,that nodes can differ up to 350Tb between each other in 3.5Pb cluster.If Threshold =1 up to 35Tb and so on.

If we use multiple racks, let's assume we have 10 racks now and they are equally divided in size (350 TB each). With a default threshold of 10, any two nodes on a given rack will have a maximum difference of 35 TB, is this correct ? Also, does this mean the difference between any two racks will also go down to 35 TB ?


> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.

Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.

> 
> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
> For example:
> 
> rack1: 1 node with 72Tb
> rack2: 6 nodes with 12Tb
> rack3: 3 nodes with 24Tb
> 
> It helps with balancing,because dublicated  block must be another rack.
> 

The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?

> Why did you select hdfs?May be lustre,cephfs and other is better choise.  

It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.



Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks for your reply. Some follow up questions below :

On Mar 20, 2013, at 5:35 AM, Алексей Бабутин <zo...@gmail.com> wrote:
> 
>  
> dfs.balance.bandwidthPerSec in hdfs-site.xml.I think balancer cant help you,because it makes all the nodes equal.They can differ only on balancer threshold.Threshold =10 by default.It means,that nodes can differ up to 350Tb between each other in 3.5Pb cluster.If Threshold =1 up to 35Tb and so on.

If we use multiple racks, let's assume we have 10 racks now and they are equally divided in size (350 TB each). With a default threshold of 10, any two nodes on a given rack will have a maximum difference of 35 TB, is this correct ? Also, does this mean the difference between any two racks will also go down to 35 TB ?


> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.

Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.

> 
> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
> For example:
> 
> rack1: 1 node with 72Tb
> rack2: 6 nodes with 12Tb
> rack3: 3 nodes with 24Tb
> 
> It helps with balancing,because dublicated  block must be another rack.
> 

The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?

> Why did you select hdfs?May be lustre,cephfs and other is better choise.  

It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.



Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
Thanks for your reply. Some follow up questions below :

On Mar 20, 2013, at 5:35 AM, Алексей Бабутин <zo...@gmail.com> wrote:
> 
>  
> dfs.balance.bandwidthPerSec in hdfs-site.xml.I think balancer cant help you,because it makes all the nodes equal.They can differ only on balancer threshold.Threshold =10 by default.It means,that nodes can differ up to 350Tb between each other in 3.5Pb cluster.If Threshold =1 up to 35Tb and so on.

If we use multiple racks, let's assume we have 10 racks now and they are equally divided in size (350 TB each). With a default threshold of 10, any two nodes on a given rack will have a maximum difference of 35 TB, is this correct ? Also, does this mean the difference between any two racks will also go down to 35 TB ?


> In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you will be able to have only 12Tb replication data.

Yes, this is true for exactly two nodes in the cluster with 12 TB and 72 TB, but not true for more than two nodes in the cluster.

> 
> Best way,on my opinion,it is using multiple racks.Nodes in rack must be with identical capacity.Racks must be identical capacity.
> For example:
> 
> rack1: 1 node with 72Tb
> rack2: 6 nodes with 12Tb
> rack3: 3 nodes with 24Tb
> 
> It helps with balancing,because dublicated  block must be another rack.
> 

The same question I asked earlier in this message, does multiple racks with default threshold for the balancer minimizes the difference between racks ?

> Why did you select hdfs?May be lustre,cephfs and other is better choise.  

It wasn't my decision, and I probably can't change it now. I am new to this cluster and trying to understand few issues. I will explore other options as you mentioned.



Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Алексей Бабутин <zo...@gmail.com>.
2013/3/19 Tapas Sarangi <ta...@gmail.com>

>
> On Mar 19, 2013, at 5:00 AM, Алексей Бабутин <zo...@gmail.com>
> wrote:
>
> node A=12TB
> node B=72TB
> How many A nodes  and B from 200 do you have?
>
>
> We have more number of A nodes than B. The ratio of the number is about
> 80, 20. Note that not all the B nodes are 72TB, that's a max value.
> Similarly for A it is a min. value.
>
>
> If you have more B than A you can deactivate A,clear it and apply again.
>
>
> Apply what ? It may not be a choice for an active system and it may
> cripple us for days.
>
> I suppose that cluster about 3-5 Tb.Run balancer with threshold 0.2 or 0.1.
>
>
> You meant 3.5 PB, then you are about right.  What this threshold does
> exactly ? We are not setting the threshold manually, but isn't hadoop's
> default 0.1 ?
>
>
> Different servers in one rack is bad idea.You should rebuild cluster with
> multiple racks.
>
>
> Why bad idea ? We are using hadoop as a file system not as a scheduler.
> How multiple racks are going to help in balancing the disk-usage across
> datanodes ?
>


dfs.balance.bandwidthPerSec in hdfs-site.xml.I think balancer cant help
you,because it makes all the nodes equal.They can differ only on balancer
threshold.Threshold =10 by default.It means,that nodes can differ up to
350Tb between each other in 3.5Pb cluster.If Threshold =1 up to 35Tb and so
on.
In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you
will be able to have only 12Tb replication data.

Best way,on my opinion,it is using multiple racks.Nodes in rack must be
with identical capacity.Racks must be identical capacity.
For example:

rack1: 1 node with 72Tb
rack2: 6 nodes with 12Tb
rack3: 3 nodes with 24Tb

It helps with balancing,because dublicated  block must be another rack.

Why did you select hdfs?May be lustre,cephfs and other is better choise.


>
> -Tapas
>
>
>
> 2013/3/19 Tapas Sarangi <ta...@gmail.com>
>
>> Hello,
>>
>> I am using one of the old legacy version (0.20) of hadoop for our
>> cluster. We have scheduled for an upgrade to the newer version within a
>> couple of months, but I would like to understand a couple of things before
>> moving towards the upgrade plan.
>>
>> We have about 200 datanodes and some of them have larger storage than
>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>
>> We found that the disk-used percentage is not symmetric through all the
>> datanodes. For larger storage nodes the percentage of disk-space used is
>> much lower than that of other nodes with smaller storage space. In larger
>> storage nodes the percentage of used disk space varies, but on average
>> about 30-50%. For the smaller storage nodes this number is as high as
>> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
>> space effectively. Is this solved in a future release ?
>>
>> If no, I would like to know  if there are any checks/debugs that one can
>> do to find an improvement with the current version or upgrading hadoop
>> should solve this problem.
>>
>> I am happy to provide additional information if needed.
>>
>> Thanks for any help.
>>
>> -Tapas
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Алексей Бабутин <zo...@gmail.com>.
2013/3/19 Tapas Sarangi <ta...@gmail.com>

>
> On Mar 19, 2013, at 5:00 AM, Алексей Бабутин <zo...@gmail.com>
> wrote:
>
> node A=12TB
> node B=72TB
> How many A nodes  and B from 200 do you have?
>
>
> We have more number of A nodes than B. The ratio of the number is about
> 80, 20. Note that not all the B nodes are 72TB, that's a max value.
> Similarly for A it is a min. value.
>
>
> If you have more B than A you can deactivate A,clear it and apply again.
>
>
> Apply what ? It may not be a choice for an active system and it may
> cripple us for days.
>
> I suppose that cluster about 3-5 Tb.Run balancer with threshold 0.2 or 0.1.
>
>
> You meant 3.5 PB, then you are about right.  What this threshold does
> exactly ? We are not setting the threshold manually, but isn't hadoop's
> default 0.1 ?
>
>
> Different servers in one rack is bad idea.You should rebuild cluster with
> multiple racks.
>
>
> Why bad idea ? We are using hadoop as a file system not as a scheduler.
> How multiple racks are going to help in balancing the disk-usage across
> datanodes ?
>


dfs.balance.bandwidthPerSec in hdfs-site.xml.I think balancer cant help
you,because it makes all the nodes equal.They can differ only on balancer
threshold.Threshold =10 by default.It means,that nodes can differ up to
350Tb between each other in 3.5Pb cluster.If Threshold =1 up to 35Tb and so
on.
In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you
will be able to have only 12Tb replication data.

Best way,on my opinion,it is using multiple racks.Nodes in rack must be
with identical capacity.Racks must be identical capacity.
For example:

rack1: 1 node with 72Tb
rack2: 6 nodes with 12Tb
rack3: 3 nodes with 24Tb

It helps with balancing,because dublicated  block must be another rack.

Why did you select hdfs?May be lustre,cephfs and other is better choise.


>
> -Tapas
>
>
>
> 2013/3/19 Tapas Sarangi <ta...@gmail.com>
>
>> Hello,
>>
>> I am using one of the old legacy version (0.20) of hadoop for our
>> cluster. We have scheduled for an upgrade to the newer version within a
>> couple of months, but I would like to understand a couple of things before
>> moving towards the upgrade plan.
>>
>> We have about 200 datanodes and some of them have larger storage than
>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>
>> We found that the disk-used percentage is not symmetric through all the
>> datanodes. For larger storage nodes the percentage of disk-space used is
>> much lower than that of other nodes with smaller storage space. In larger
>> storage nodes the percentage of used disk space varies, but on average
>> about 30-50%. For the smaller storage nodes this number is as high as
>> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
>> space effectively. Is this solved in a future release ?
>>
>> If no, I would like to know  if there are any checks/debugs that one can
>> do to find an improvement with the current version or upgrading hadoop
>> should solve this problem.
>>
>> I am happy to provide additional information if needed.
>>
>> Thanks for any help.
>>
>> -Tapas
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Алексей Бабутин <zo...@gmail.com>.
2013/3/19 Tapas Sarangi <ta...@gmail.com>

>
> On Mar 19, 2013, at 5:00 AM, Алексей Бабутин <zo...@gmail.com>
> wrote:
>
> node A=12TB
> node B=72TB
> How many A nodes  and B from 200 do you have?
>
>
> We have more number of A nodes than B. The ratio of the number is about
> 80, 20. Note that not all the B nodes are 72TB, that's a max value.
> Similarly for A it is a min. value.
>
>
> If you have more B than A you can deactivate A,clear it and apply again.
>
>
> Apply what ? It may not be a choice for an active system and it may
> cripple us for days.
>
> I suppose that cluster about 3-5 Tb.Run balancer with threshold 0.2 or 0.1.
>
>
> You meant 3.5 PB, then you are about right.  What this threshold does
> exactly ? We are not setting the threshold manually, but isn't hadoop's
> default 0.1 ?
>
>
> Different servers in one rack is bad idea.You should rebuild cluster with
> multiple racks.
>
>
> Why bad idea ? We are using hadoop as a file system not as a scheduler.
> How multiple racks are going to help in balancing the disk-usage across
> datanodes ?
>


dfs.balance.bandwidthPerSec in hdfs-site.xml.I think balancer cant help
you,because it makes all the nodes equal.They can differ only on balancer
threshold.Threshold =10 by default.It means,that nodes can differ up to
350Tb between each other in 3.5Pb cluster.If Threshold =1 up to 35Tb and so
on.
In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you
will be able to have only 12Tb replication data.

Best way,on my opinion,it is using multiple racks.Nodes in rack must be
with identical capacity.Racks must be identical capacity.
For example:

rack1: 1 node with 72Tb
rack2: 6 nodes with 12Tb
rack3: 3 nodes with 24Tb

It helps with balancing,because dublicated  block must be another rack.

Why did you select hdfs?May be lustre,cephfs and other is better choise.


>
> -Tapas
>
>
>
> 2013/3/19 Tapas Sarangi <ta...@gmail.com>
>
>> Hello,
>>
>> I am using one of the old legacy version (0.20) of hadoop for our
>> cluster. We have scheduled for an upgrade to the newer version within a
>> couple of months, but I would like to understand a couple of things before
>> moving towards the upgrade plan.
>>
>> We have about 200 datanodes and some of them have larger storage than
>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>
>> We found that the disk-used percentage is not symmetric through all the
>> datanodes. For larger storage nodes the percentage of disk-space used is
>> much lower than that of other nodes with smaller storage space. In larger
>> storage nodes the percentage of used disk space varies, but on average
>> about 30-50%. For the smaller storage nodes this number is as high as
>> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
>> space effectively. Is this solved in a future release ?
>>
>> If no, I would like to know  if there are any checks/debugs that one can
>> do to find an improvement with the current version or upgrading hadoop
>> should solve this problem.
>>
>> I am happy to provide additional information if needed.
>>
>> Thanks for any help.
>>
>> -Tapas
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Алексей Бабутин <zo...@gmail.com>.
2013/3/19 Tapas Sarangi <ta...@gmail.com>

>
> On Mar 19, 2013, at 5:00 AM, Алексей Бабутин <zo...@gmail.com>
> wrote:
>
> node A=12TB
> node B=72TB
> How many A nodes  and B from 200 do you have?
>
>
> We have more number of A nodes than B. The ratio of the number is about
> 80, 20. Note that not all the B nodes are 72TB, that's a max value.
> Similarly for A it is a min. value.
>
>
> If you have more B than A you can deactivate A,clear it and apply again.
>
>
> Apply what ? It may not be a choice for an active system and it may
> cripple us for days.
>
> I suppose that cluster about 3-5 Tb.Run balancer with threshold 0.2 or 0.1.
>
>
> You meant 3.5 PB, then you are about right.  What this threshold does
> exactly ? We are not setting the threshold manually, but isn't hadoop's
> default 0.1 ?
>
>
> Different servers in one rack is bad idea.You should rebuild cluster with
> multiple racks.
>
>
> Why bad idea ? We are using hadoop as a file system not as a scheduler.
> How multiple racks are going to help in balancing the disk-usage across
> datanodes ?
>


dfs.balance.bandwidthPerSec in hdfs-site.xml.I think balancer cant help
you,because it makes all the nodes equal.They can differ only on balancer
threshold.Threshold =10 by default.It means,that nodes can differ up to
350Tb between each other in 3.5Pb cluster.If Threshold =1 up to 35Tb and so
on.
In ideal case with replication factor 2 ,with two nodes 12Tb and 72Tb you
will be able to have only 12Tb replication data.

Best way,on my opinion,it is using multiple racks.Nodes in rack must be
with identical capacity.Racks must be identical capacity.
For example:

rack1: 1 node with 72Tb
rack2: 6 nodes with 12Tb
rack3: 3 nodes with 24Tb

It helps with balancing,because dublicated  block must be another rack.

Why did you select hdfs?May be lustre,cephfs and other is better choise.


>
> -Tapas
>
>
>
> 2013/3/19 Tapas Sarangi <ta...@gmail.com>
>
>> Hello,
>>
>> I am using one of the old legacy version (0.20) of hadoop for our
>> cluster. We have scheduled for an upgrade to the newer version within a
>> couple of months, but I would like to understand a couple of things before
>> moving towards the upgrade plan.
>>
>> We have about 200 datanodes and some of them have larger storage than
>> others. The storage for the datanodes varies between 12 TB to 72 TB.
>>
>> We found that the disk-used percentage is not symmetric through all the
>> datanodes. For larger storage nodes the percentage of disk-space used is
>> much lower than that of other nodes with smaller storage space. In larger
>> storage nodes the percentage of used disk space varies, but on average
>> about 30-50%. For the smaller storage nodes this number is as high as
>> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
>> space effectively. Is this solved in a future release ?
>>
>> If no, I would like to know  if there are any checks/debugs that one can
>> do to find an improvement with the current version or upgrading hadoop
>> should solve this problem.
>>
>> I am happy to provide additional information if needed.
>>
>> Thanks for any help.
>>
>> -Tapas
>>
>>
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 19, 2013, at 5:00 AM, Алексей Бабутин <zo...@gmail.com> wrote:

> node A=12TB
> node B=72TB
> How many A nodes  and B from 200 do you have?

We have more number of A nodes than B. The ratio of the number is about 80, 20. Note that not all the B nodes are 72TB, that's a max value. Similarly for A it is a min. value.
 

> If you have more B than A you can deactivate A,clear it and apply again.

Apply what ? It may not be a choice for an active system and it may cripple us for days.

> I suppose that cluster about 3-5 Tb.Run balancer with threshold 0.2 or 0.1.

You meant 3.5 PB, then you are about right.  What this threshold does exactly ? We are not setting the threshold manually, but isn't hadoop's default 0.1 ?

> 
> Different servers in one rack is bad idea.You should rebuild cluster with multiple racks.  

Why bad idea ? We are using hadoop as a file system not as a scheduler. How multiple racks are going to help in balancing the disk-usage across datanodes ?

-Tapas


> 
> 2013/3/19 Tapas Sarangi <ta...@gmail.com>
> Hello,
> 
> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
> 
> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
> 
> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
> 
> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
> 
> I am happy to provide additional information if needed.
> 
> Thanks for any help.
> 
> -Tapas
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 19, 2013, at 5:00 AM, Алексей Бабутин <zo...@gmail.com> wrote:

> node A=12TB
> node B=72TB
> How many A nodes  and B from 200 do you have?

We have more number of A nodes than B. The ratio of the number is about 80, 20. Note that not all the B nodes are 72TB, that's a max value. Similarly for A it is a min. value.
 

> If you have more B than A you can deactivate A,clear it and apply again.

Apply what ? It may not be a choice for an active system and it may cripple us for days.

> I suppose that cluster about 3-5 Tb.Run balancer with threshold 0.2 or 0.1.

You meant 3.5 PB, then you are about right.  What this threshold does exactly ? We are not setting the threshold manually, but isn't hadoop's default 0.1 ?

> 
> Different servers in one rack is bad idea.You should rebuild cluster with multiple racks.  

Why bad idea ? We are using hadoop as a file system not as a scheduler. How multiple racks are going to help in balancing the disk-usage across datanodes ?

-Tapas


> 
> 2013/3/19 Tapas Sarangi <ta...@gmail.com>
> Hello,
> 
> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
> 
> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
> 
> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
> 
> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
> 
> I am happy to provide additional information if needed.
> 
> Thanks for any help.
> 
> -Tapas
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 19, 2013, at 5:00 AM, Алексей Бабутин <zo...@gmail.com> wrote:

> node A=12TB
> node B=72TB
> How many A nodes  and B from 200 do you have?

We have more number of A nodes than B. The ratio of the number is about 80, 20. Note that not all the B nodes are 72TB, that's a max value. Similarly for A it is a min. value.
 

> If you have more B than A you can deactivate A,clear it and apply again.

Apply what ? It may not be a choice for an active system and it may cripple us for days.

> I suppose that cluster about 3-5 Tb.Run balancer with threshold 0.2 or 0.1.

You meant 3.5 PB, then you are about right.  What this threshold does exactly ? We are not setting the threshold manually, but isn't hadoop's default 0.1 ?

> 
> Different servers in one rack is bad idea.You should rebuild cluster with multiple racks.  

Why bad idea ? We are using hadoop as a file system not as a scheduler. How multiple racks are going to help in balancing the disk-usage across datanodes ?

-Tapas


> 
> 2013/3/19 Tapas Sarangi <ta...@gmail.com>
> Hello,
> 
> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
> 
> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
> 
> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
> 
> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
> 
> I am happy to provide additional information if needed.
> 
> Thanks for any help.
> 
> -Tapas
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Tapas Sarangi <ta...@gmail.com>.
On Mar 19, 2013, at 5:00 AM, Алексей Бабутин <zo...@gmail.com> wrote:

> node A=12TB
> node B=72TB
> How many A nodes  and B from 200 do you have?

We have more number of A nodes than B. The ratio of the number is about 80, 20. Note that not all the B nodes are 72TB, that's a max value. Similarly for A it is a min. value.
 

> If you have more B than A you can deactivate A,clear it and apply again.

Apply what ? It may not be a choice for an active system and it may cripple us for days.

> I suppose that cluster about 3-5 Tb.Run balancer with threshold 0.2 or 0.1.

You meant 3.5 PB, then you are about right.  What this threshold does exactly ? We are not setting the threshold manually, but isn't hadoop's default 0.1 ?

> 
> Different servers in one rack is bad idea.You should rebuild cluster with multiple racks.  

Why bad idea ? We are using hadoop as a file system not as a scheduler. How multiple racks are going to help in balancing the disk-usage across datanodes ?

-Tapas


> 
> 2013/3/19 Tapas Sarangi <ta...@gmail.com>
> Hello,
> 
> I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled for an upgrade to the newer version within a couple of months, but I would like to understand a couple of things before moving towards the upgrade plan.
> 
> We have about 200 datanodes and some of them have larger storage than others. The storage for the datanodes varies between 12 TB to 72 TB.
> 
> We found that the disk-used percentage is not symmetric through all the datanodes. For larger storage nodes the percentage of disk-space used is much lower than that of other nodes with smaller storage space. In larger storage nodes the percentage of used disk space varies, but on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is this expected ? If so, then we are not using a lot of the disk space effectively. Is this solved in a future release ?
> 
> If no, I would like to know  if there are any checks/debugs that one can do to find an improvement with the current version or upgrading hadoop should solve this problem.
> 
> I am happy to provide additional information if needed.
> 
> Thanks for any help.
> 
> -Tapas
> 
> 


Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Алексей Бабутин <zo...@gmail.com>.
node A=12TB
node B=72TB
How many A nodes  and B from 200 do you have?
If you have more B than A you can deactivate A,clear it and apply again.
I suppose that cluster about 3-5 Tb.Run balancer with threshold 0.2 or 0.1.

Different servers in one rack is bad idea.You should rebuild cluster with
multiple racks.

2013/3/19 Tapas Sarangi <ta...@gmail.com>

> Hello,
>
> I am using one of the old legacy version (0.20) of hadoop for our cluster.
> We have scheduled for an upgrade to the newer version within a couple of
> months, but I would like to understand a couple of things before moving
> towards the upgrade plan.
>
> We have about 200 datanodes and some of them have larger storage than
> others. The storage for the datanodes varies between 12 TB to 72 TB.
>
> We found that the disk-used percentage is not symmetric through all the
> datanodes. For larger storage nodes the percentage of disk-space used is
> much lower than that of other nodes with smaller storage space. In larger
> storage nodes the percentage of used disk space varies, but on average
> about 30-50%. For the smaller storage nodes this number is as high as
> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
> space effectively. Is this solved in a future release ?
>
> If no, I would like to know  if there are any checks/debugs that one can
> do to find an improvement with the current version or upgrading hadoop
> should solve this problem.
>
> I am happy to provide additional information if needed.
>
> Thanks for any help.
>
> -Tapas
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Алексей Бабутин <zo...@gmail.com>.
node A=12TB
node B=72TB
How many A nodes  and B from 200 do you have?
If you have more B than A you can deactivate A,clear it and apply again.
I suppose that cluster about 3-5 Tb.Run balancer with threshold 0.2 or 0.1.

Different servers in one rack is bad idea.You should rebuild cluster with
multiple racks.

2013/3/19 Tapas Sarangi <ta...@gmail.com>

> Hello,
>
> I am using one of the old legacy version (0.20) of hadoop for our cluster.
> We have scheduled for an upgrade to the newer version within a couple of
> months, but I would like to understand a couple of things before moving
> towards the upgrade plan.
>
> We have about 200 datanodes and some of them have larger storage than
> others. The storage for the datanodes varies between 12 TB to 72 TB.
>
> We found that the disk-used percentage is not symmetric through all the
> datanodes. For larger storage nodes the percentage of disk-space used is
> much lower than that of other nodes with smaller storage space. In larger
> storage nodes the percentage of used disk space varies, but on average
> about 30-50%. For the smaller storage nodes this number is as high as
> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
> space effectively. Is this solved in a future release ?
>
> If no, I would like to know  if there are any checks/debugs that one can
> do to find an improvement with the current version or upgrading hadoop
> should solve this problem.
>
> I am happy to provide additional information if needed.
>
> Thanks for any help.
>
> -Tapas
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Bertrand Dechoux <de...@gmail.com>.
Hi,

It is not explicitly said but did you use the balancer?
http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer

Regards

Bertrand

On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hello,
>
> I am using one of the old legacy version (0.20) of hadoop for our cluster.
> We have scheduled for an upgrade to the newer version within a couple of
> months, but I would like to understand a couple of things before moving
> towards the upgrade plan.
>
> We have about 200 datanodes and some of them have larger storage than
> others. The storage for the datanodes varies between 12 TB to 72 TB.
>
> We found that the disk-used percentage is not symmetric through all the
> datanodes. For larger storage nodes the percentage of disk-space used is
> much lower than that of other nodes with smaller storage space. In larger
> storage nodes the percentage of used disk space varies, but on average
> about 30-50%. For the smaller storage nodes this number is as high as
> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
> space effectively. Is this solved in a future release ?
>
> If no, I would like to know  if there are any checks/debugs that one can
> do to find an improvement with the current version or upgrading hadoop
> should solve this problem.
>
> I am happy to provide additional information if needed.
>
> Thanks for any help.
>
> -Tapas
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Алексей Бабутин <zo...@gmail.com>.
node A=12TB
node B=72TB
How many A nodes  and B from 200 do you have?
If you have more B than A you can deactivate A,clear it and apply again.
I suppose that cluster about 3-5 Tb.Run balancer with threshold 0.2 or 0.1.

Different servers in one rack is bad idea.You should rebuild cluster with
multiple racks.

2013/3/19 Tapas Sarangi <ta...@gmail.com>

> Hello,
>
> I am using one of the old legacy version (0.20) of hadoop for our cluster.
> We have scheduled for an upgrade to the newer version within a couple of
> months, but I would like to understand a couple of things before moving
> towards the upgrade plan.
>
> We have about 200 datanodes and some of them have larger storage than
> others. The storage for the datanodes varies between 12 TB to 72 TB.
>
> We found that the disk-used percentage is not symmetric through all the
> datanodes. For larger storage nodes the percentage of disk-space used is
> much lower than that of other nodes with smaller storage space. In larger
> storage nodes the percentage of used disk space varies, but on average
> about 30-50%. For the smaller storage nodes this number is as high as
> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
> space effectively. Is this solved in a future release ?
>
> If no, I would like to know  if there are any checks/debugs that one can
> do to find an improvement with the current version or upgrading hadoop
> should solve this problem.
>
> I am happy to provide additional information if needed.
>
> Thanks for any help.
>
> -Tapas
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Bertrand Dechoux <de...@gmail.com>.
Hi,

It is not explicitly said but did you use the balancer?
http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer

Regards

Bertrand

On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hello,
>
> I am using one of the old legacy version (0.20) of hadoop for our cluster.
> We have scheduled for an upgrade to the newer version within a couple of
> months, but I would like to understand a couple of things before moving
> towards the upgrade plan.
>
> We have about 200 datanodes and some of them have larger storage than
> others. The storage for the datanodes varies between 12 TB to 72 TB.
>
> We found that the disk-used percentage is not symmetric through all the
> datanodes. For larger storage nodes the percentage of disk-space used is
> much lower than that of other nodes with smaller storage space. In larger
> storage nodes the percentage of used disk space varies, but on average
> about 30-50%. For the smaller storage nodes this number is as high as
> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
> space effectively. Is this solved in a future release ?
>
> If no, I would like to know  if there are any checks/debugs that one can
> do to find an improvement with the current version or upgrading hadoop
> should solve this problem.
>
> I am happy to provide additional information if needed.
>
> Thanks for any help.
>
> -Tapas
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Bertrand Dechoux <de...@gmail.com>.
Hi,

It is not explicitly said but did you use the balancer?
http://hadoop.apache.org/docs/r1.0.4/commands_manual.html#balancer

Regards

Bertrand

On Mon, Mar 18, 2013 at 10:01 PM, Tapas Sarangi <ta...@gmail.com>wrote:

> Hello,
>
> I am using one of the old legacy version (0.20) of hadoop for our cluster.
> We have scheduled for an upgrade to the newer version within a couple of
> months, but I would like to understand a couple of things before moving
> towards the upgrade plan.
>
> We have about 200 datanodes and some of them have larger storage than
> others. The storage for the datanodes varies between 12 TB to 72 TB.
>
> We found that the disk-used percentage is not symmetric through all the
> datanodes. For larger storage nodes the percentage of disk-space used is
> much lower than that of other nodes with smaller storage space. In larger
> storage nodes the percentage of used disk space varies, but on average
> about 30-50%. For the smaller storage nodes this number is as high as
> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
> space effectively. Is this solved in a future release ?
>
> If no, I would like to know  if there are any checks/debugs that one can
> do to find an improvement with the current version or upgrading hadoop
> should solve this problem.
>
> I am happy to provide additional information if needed.
>
> Thanks for any help.
>
> -Tapas
>
>

Re: disk used percentage is not symmetric on datanodes (balancer)

Posted by Алексей Бабутин <zo...@gmail.com>.
node A=12TB
node B=72TB
How many A nodes  and B from 200 do you have?
If you have more B than A you can deactivate A,clear it and apply again.
I suppose that cluster about 3-5 Tb.Run balancer with threshold 0.2 or 0.1.

Different servers in one rack is bad idea.You should rebuild cluster with
multiple racks.

2013/3/19 Tapas Sarangi <ta...@gmail.com>

> Hello,
>
> I am using one of the old legacy version (0.20) of hadoop for our cluster.
> We have scheduled for an upgrade to the newer version within a couple of
> months, but I would like to understand a couple of things before moving
> towards the upgrade plan.
>
> We have about 200 datanodes and some of them have larger storage than
> others. The storage for the datanodes varies between 12 TB to 72 TB.
>
> We found that the disk-used percentage is not symmetric through all the
> datanodes. For larger storage nodes the percentage of disk-space used is
> much lower than that of other nodes with smaller storage space. In larger
> storage nodes the percentage of used disk space varies, but on average
> about 30-50%. For the smaller storage nodes this number is as high as
> 99.9%. Is this expected ? If so, then we are not using a lot of the disk
> space effectively. Is this solved in a future release ?
>
> If no, I would like to know  if there are any checks/debugs that one can
> do to find an improvement with the current version or upgrading hadoop
> should solve this problem.
>
> I am happy to provide additional information if needed.
>
> Thanks for any help.
>
> -Tapas
>
>