You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Jonathan Seidman <jo...@gmail.com> on 2009/10/21 18:15:03 UTC

Namenode with External Storage?

Apologies if this has been answered previously, but I'm unable to find
anything that seems to cover this.

It's clear that datanodes require local storage for Hadoop to function
efficiently, but is there any significant disadvantage to using external
storage for namenodes? We're exploring the possibility of using a different
class of hardware for our namenodes with attached storage and little or no
internal storage. Some of the benefits this would provide us are: 1)
allowing our sysadmins to deploy hardware that they're familiar with and
already have considerable experience keeping up in a production environment.
2) no namenode downtime to replace a failed disk.

We don't anticipate that this approach would cause any significant
degradation to performance, but let me know if there's something we're not
considering.

Thanks.

Jonathan

Re: Namenode with External Storage?

Posted by Dhruba Borthakur <dh...@gmail.com>.
The namenode uses storage to write file system transaction logs as well as
to write server-debug-log-messages. Both of these could be stored in a NFS
mounted file system, should work well.

thanks,
dhruba


On Wed, Oct 21, 2009 at 9:15 AM, Jonathan Seidman <
jonathan.seidman@gmail.com> wrote:

> Apologies if this has been answered previously, but I'm unable to find
> anything that seems to cover this.
>
> It's clear that datanodes require local storage for Hadoop to function
> efficiently, but is there any significant disadvantage to using external
> storage for namenodes? We're exploring the possibility of using a different
> class of hardware for our namenodes with attached storage and little or no
> internal storage. Some of the benefits this would provide us are: 1)
> allowing our sysadmins to deploy hardware that they're familiar with and
> already have considerable experience keeping up in a production
> environment.
> 2) no namenode downtime to replace a failed disk.
>
> We don't anticipate that this approach would cause any significant
> degradation to performance, but let me know if there's something we're not
> considering.
>
> Thanks.
>
> Jonathan
>



-- 
Connect to me at http://www.facebook.com/dhruba

Re: Namenode with External Storage?

Posted by Grant Mackey <gm...@cs.ucf.edu>.
oops, that is correct. My mistake

Quoting Sanjay Radia <sr...@yahoo-inc.com>:

>
> On Oct 22, 2009, at 9:37 AM, <gm...@cs.ucf.edu> wrote:
>
>> As with Dhruba's comment, so long as it is just the namenode that is
>> running on a networked file system everything should be chill. The  namenode
>> keeps all of its working metadata in main mem, and it only  occasionally
>> pushes a log file out to hard storage (and if I remember correctly  you can
>> adjust this time window in one of the site files).
>>
>
> Actually it pushes out the update logs on each and every update   
> synchronously.
> The checkpoint however is pushed out periodically.
>
> Also, at yahoo, we push out NN state to multiple disks and one of  
> the  "disks" is a nfs filer. This is configurable.
>
> sanjay
>>
>> However, you are going to run into huge performance issues running
>> datanodes over a networked storage system. Having to push that many  file
>> requests over a network for a respectable mapreduce job is going to  kill
>> your equipment.
>>
>> - Grant
>>
>> On Oct 21 2009, Jonathan Seidman wrote:
>>
>>> Apologies if this has been answered previously, but I'm unable to  find
>>> anything that seems to cover this.
>>>
>>> It's clear that datanodes require local storage for Hadoop to  function
>>> efficiently, but is there any significant disadvantage to using  external
>>> storage for namenodes? We're exploring the possibility of using a
>>> different class of hardware for our namenodes with attached  storage and
>>> little or no internal storage. Some of the benefits this would  provide us
>>> are: 1) allowing our sysadmins to deploy hardware that they're  familiar
>>> with and already have considerable experience keeping up in a  production
>>> environment. 2) no namenode downtime to replace a failed disk.
>>>
>>> We don't anticipate that this approach would cause any significant
>>> degradation to performance, but let me know if there's something  we're not
>>> considering.
>>>
>>> Thanks.
>>>
>>> Jonathan
>>>
>>
>> --
>> --
>> Grant Mackey
>> PhD student Computer Engineering
>> University of Central Florida
>> Rm 231 cube 5 (321) 960-8851
>>
>>
>
>



Grant Mackey
UCF Research Assistant
Engineering III
Rm 238 Cubicle 1

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


Re: Namenode with External Storage?

Posted by Sanjay Radia <sr...@yahoo-inc.com>.
On Oct 22, 2009, at 9:37 AM, <gm...@cs.ucf.edu> wrote:

> As with Dhruba's comment, so long as it is just the namenode that is
> running on a networked file system everything should be chill. The  
> namenode
> keeps all of its working metadata in main mem, and it only  
> occasionally
> pushes a log file out to hard storage (and if I remember correctly  
> you can
> adjust this time window in one of the site files).
>

Actually it pushes out the update logs on each and every update  
synchronously.
The checkpoint however is pushed out periodically.

Also, at yahoo, we push out NN state to multiple disks and one of the  
"disks" is a nfs filer. This is configurable.

sanjay
>
> However, you are going to run into huge performance issues running
> datanodes over a networked storage system. Having to push that many  
> file
> requests over a network for a respectable mapreduce job is going to  
> kill
> your equipment.
>
>  - Grant
>
> On Oct 21 2009, Jonathan Seidman wrote:
>
> >Apologies if this has been answered previously, but I'm unable to  
> find
> >anything that seems to cover this.
> >
> > It's clear that datanodes require local storage for Hadoop to  
> function
> > efficiently, but is there any significant disadvantage to using  
> external
> > storage for namenodes? We're exploring the possibility of using a
> > different class of hardware for our namenodes with attached  
> storage and
> > little or no internal storage. Some of the benefits this would  
> provide us
> > are: 1) allowing our sysadmins to deploy hardware that they're  
> familiar
> > with and already have considerable experience keeping up in a  
> production
> > environment. 2) no namenode downtime to replace a failed disk.
> >
> >We don't anticipate that this approach would cause any significant
> >degradation to performance, but let me know if there's something  
> we're not
> >considering.
> >
> >Thanks.
> >
> >Jonathan
> >
>
> --
> --
> Grant Mackey
> PhD student Computer Engineering
> University of Central Florida
> Rm 231 cube 5 (321) 960-8851
>
>


Re: Namenode with External Storage?

Posted by gm...@cs.ucf.edu.
As with Dhruba's comment, so long as it is just the namenode that is 
running on a networked file system everything should be chill. The namenode 
keeps all of its working metadata in main mem, and it only occasionally 
pushes a log file out to hard storage (and if I remember correctly you can 
adjust this time window in one of the site files).

However, you are going to run into huge performance issues running 
datanodes over a networked storage system. Having to push that many file 
requests over a network for a respectable mapreduce job is going to kill 
your equipment.

 - Grant

On Oct 21 2009, Jonathan Seidman wrote:

>Apologies if this has been answered previously, but I'm unable to find
>anything that seems to cover this.
>
> It's clear that datanodes require local storage for Hadoop to function 
> efficiently, but is there any significant disadvantage to using external 
> storage for namenodes? We're exploring the possibility of using a 
> different class of hardware for our namenodes with attached storage and 
> little or no internal storage. Some of the benefits this would provide us 
> are: 1) allowing our sysadmins to deploy hardware that they're familiar 
> with and already have considerable experience keeping up in a production 
> environment. 2) no namenode downtime to replace a failed disk.
>
>We don't anticipate that this approach would cause any significant
>degradation to performance, but let me know if there's something we're not
>considering.
>
>Thanks.
>
>Jonathan
>

-- 
--
Grant Mackey
PhD student Computer Engineering
University of Central Florida
Rm 231 cube 5 (321) 960-8851