You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Allen Wittenauer <aw...@yahoo-inc.com> on 2008/09/01 16:22:41 UTC

Re: Load balancing in HDFS



On 8/27/08 7:51 AM, "Mork0075" <mo...@googlemail.com> wrote:

> This sound really interesting. And while increasing the replicas for
> certain files, the available troughput for these files increases too?

    Yes, as there are more places to pull the file from.  This needs to get
weighed against the amount of work the name node will use to re-replicate
the file in case of failure and the total amount of disk space used... So
the extra bandwidth isn't "free".

> 
> Allen Wittenauer schrieb:
>> 
>> 
>> On 8/27/08 12:54 AM, "Mork0075" <mo...@googlemail.com> wrote:
>>> i'am planning to use HDFS as a DFS in a web application evenvironment.
>>> There are two requirements: fault tolerence, which is ensured by the
>>> replicas and load balancing.
>> 
>>     There is a SPOF in the form of the name node.  So depending upon your
>> needs, that may or may not be acceptable risk.
>> 
>> On 8/27/08 1:23 AM, "Mork0075" <mo...@googlemail.com> wrote:
>>> Some documents stored in the HDFS could be very popular and
>>> therefor accessed more often then others. Then HDFS needs to balance the
>>> load - distribute the requests to different nodes. Is i possible?
>> 
>>     Not automatically.  However, it is possible to manually/programmatically
>> increase the replication on files.
>> 
>>     This is one of the possible uses for the new audit logging in 0.18... By
>> watching the log, it should be possible to determine which files need a
>> higher replication factor.
>> 
>> 
>