You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by praveenesh kumar <pr...@gmail.com> on 2011/09/22 06:42:47 UTC

Can we replace namenode machine with some other machine ?

Hi all,

Can we replace our namenode machine later with some other machine. ?
Actually I got a new  server machine in my cluster and now I want to make
this machine as my new namenode and jobtracker node ?
Also Does Namenode/JobTracker machine's configuration needs to be better
than datanodes/tasktracker's ??

How can I achieve this target with least overhead ?

Thanks,
Praveenesh

Re: Can we replace namenode machine with some other machine ?

Posted by Harsh J <ha...@cloudera.com>.
On Thu, Sep 22, 2011 at 11:44 AM, praveenesh kumar <pr...@gmail.com> wrote:
> But apart from storing metadata info, Is there anything more NN/JT machines
> are doing ?? .
> So I can say I can survive with poor NN if I am not dealing with lots of
> files in HDFS ?
<snip>

The JT and NN are your central throughput machines. All client
communications happen with these primarily and are its first contacts.
The JT and NN also need CPU to manage the slaves that have joined
them, and maintain states of each of them.

I'd not place them on poor machines, and face a general slowdown on
the whole cluster. Also, I'd ensure that the machine I use for my
master services must be fairly more reliable in material than the
slaves (of which losses are easier to bear).

>> > > > Can we replace our namenode machine later with some other
>> > machine. ?
>> > > > Actually I got a new  server machine in my cluster and now I want
>> > > > to make
>> > > > this machine as my new namenode and jobtracker node ?

So long as your hostname does not change, you should not have an
issue. The change will be as transparent to your cluster as a restart
would be.

If you are introducing a hostname change, certain ecosystem components
such as Hive, apart from all your client configs, may need minor
repairs to their states.

(This is another reason why you should use hostnames for HDFS, and not
IP addresses)

>> > > > How can I achieve this target with least overhead ?

If there's no hostname change happening here, then it should be as
simple as: Turn off HDFS, switch host pointers (if IP is different for
the new addition), move metadata to new machine, ensure permissions
are set and that everything is a mirror copy of what was before, run
NN, ensure it binds fine and comes up with all the files intact (You
can browse files on a DN-less HDFS cluster just fine, that is a
non-issue), start the DNs back up.

As always, having extra backups of your dfs.name.dir contents is
always recommended.

-- 
Harsh J

Re: Can we replace namenode machine with some other machine ?

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
In NN many deamons will run. For replicating the blocks from one DN to other DN when there is no enough replications. SafeMode monitering, LeaseManager & and will also maintain the  Blocks to machineList mappings in memory, HeartbeatMonitoring, IPC handlers......etc.

In JT also there are many deamons like this.

If you are not dealing with very less files then normal configuration is enough.
But you should configure enough memory for running the NN and JT.This always will comes under your usage.
For better understanding, I would suggest you to go through the Hadoop Deffenitive Guide. All this details has been documented very well.


Regards,
Uma 

----- Original Message -----
From: praveenesh kumar <pr...@gmail.com>
Date: Thursday, September 22, 2011 11:45 am
Subject: Re: Can we replace namenode machine with some other machine ?
To: common-user@hadoop.apache.org

> But apart from storing metadata info, Is there anything more NN/JT 
> machinesare doing ?? .
> So I can say I can survive with poor NN if I am not dealing with 
> lots of
> files in HDFS ?
> 
> On Thu, Sep 22, 2011 at 11:08 AM, Uma Maheswara Rao G 72686 <
> maheswara@huawei.com> wrote:
> 
> > By just changing the configs will not effect your data. You need 
> to restart
> > your DNs to connect to new NN.
> >
> > For the second question:
> >  It will again depends on your usage. If your files will more in 
> DFS then
> > NN will consume more memory as it needs to store all the 
> metadata info of
> > the files in NameSpace.
> >
> >  If your files are more and more then it is recommended that 
> dont put the
> > NN and JT in same machine.
> >
> > Coming to DN case: Configured space will used for storing the block
> > files.Once it is filled the space then NN will not select this 
> DN for
> > further writes. So, if one DN has less space should fine than 
> less space for
> > NN in big clusters.
> >
> > Configuring good configuration DN which has very good amount of 
> space. And
> > NN has less space to store your files metadata info then its of 
> no use to
> > have more space in DNs right :-)
> >
> >
> > Regards,
> > Uma
> > ----- Original Message -----
> > From: praveenesh kumar <pr...@gmail.com>
> > Date: Thursday, September 22, 2011 10:42 am
> > Subject: Re: Can we replace namenode machine with some other 
> machine ?
> > To: common-user@hadoop.apache.org
> >
> > > If I just change configuration settings in slave machines, 
> Will it
> > > effectany of the data that is currently residing in the 
> cluster ??
> > >
> > > And my second question was...
> > > Do we need the master node (NN/JT hosting machine) to have good
> > > configuration than our slave machines(DN/TT hosting machines).
> > >
> > > Actually my master node is a weaker machine than my slave
> > > machines,because I
> > > am assuming that master machines does not do much additional work,
> > > and its
> > > okay to have a weak machine as master.
> > > Now I have a new big server machine just being added to my
> > > cluster. So I am
> > > thinking shall I make this new machine as my new master(NN/JT) or
> > > just add
> > > this machine as slave ?
> > >
> > > Thanks,
> > > Praveenesh
> > >
> > >
> > > On Thu, Sep 22, 2011 at 10:20 AM, Uma Maheswara Rao G 72686 <
> > > maheswara@huawei.com> wrote:
> > >
> > > > You copy the same installations to new machine and change ip
> > > address.> After that configure the new NN addresses to your
> > > clients and DNs.
> > > >
> > > > >Also Does Namenode/JobTracker machine's configuration needs to
> > > be better
> > > > >than datanodes/tasktracker's ??
> > > >  I did not get this question.
> > > >
> > > > Regards,
> > > > Uma
> > > >
> > > > ----- Original Message -----
> > > > From: praveenesh kumar <pr...@gmail.com>
> > > > Date: Thursday, September 22, 2011 10:13 am
> > > > Subject: Can we replace namenode machine with some other 
> machine ?
> > > > To: common-user@hadoop.apache.org
> > > >
> > > > > Hi all,
> > > > >
> > > > > Can we replace our namenode machine later with some other
> > > machine. ?
> > > > > Actually I got a new  server machine in my cluster and now 
> I want
> > > > > to make
> > > > > this machine as my new namenode and jobtracker node ?
> > > > > Also Does Namenode/JobTracker machine's configuration 
> needs to be
> > > > > betterthan datanodes/tasktracker's ??
> > > > >
> > > > > How can I achieve this target with least overhead ?
> > > > >
> > > > > Thanks,
> > > > > Praveenesh
> > > > >
> > > >
> > >
> >
> 

Re: Can we replace namenode machine with some other machine ?

Posted by praveenesh kumar <pr...@gmail.com>.
But apart from storing metadata info, Is there anything more NN/JT machines
are doing ?? .
So I can say I can survive with poor NN if I am not dealing with lots of
files in HDFS ?

On Thu, Sep 22, 2011 at 11:08 AM, Uma Maheswara Rao G 72686 <
maheswara@huawei.com> wrote:

> By just changing the configs will not effect your data. You need to restart
> your DNs to connect to new NN.
>
> For the second question:
>  It will again depends on your usage. If your files will more in DFS then
> NN will consume more memory as it needs to store all the metadata info of
> the files in NameSpace.
>
>  If your files are more and more then it is recommended that dont put the
> NN and JT in same machine.
>
> Coming to DN case: Configured space will used for storing the block
> files.Once it is filled the space then NN will not select this DN for
> further writes. So, if one DN has less space should fine than less space for
> NN in big clusters.
>
> Configuring good configuration DN which has very good amount of space. And
> NN has less space to store your files metadata info then its of no use to
> have more space in DNs right :-)
>
>
> Regards,
> Uma
> ----- Original Message -----
> From: praveenesh kumar <pr...@gmail.com>
> Date: Thursday, September 22, 2011 10:42 am
> Subject: Re: Can we replace namenode machine with some other machine ?
> To: common-user@hadoop.apache.org
>
> > If I just change configuration settings in slave machines, Will it
> > effectany of the data that is currently residing in the cluster ??
> >
> > And my second question was...
> > Do we need the master node (NN/JT hosting machine) to have good
> > configuration than our slave machines(DN/TT hosting machines).
> >
> > Actually my master node is a weaker machine than my slave
> > machines,because I
> > am assuming that master machines does not do much additional work,
> > and its
> > okay to have a weak machine as master.
> > Now I have a new big server machine just being added to my
> > cluster. So I am
> > thinking shall I make this new machine as my new master(NN/JT) or
> > just add
> > this machine as slave ?
> >
> > Thanks,
> > Praveenesh
> >
> >
> > On Thu, Sep 22, 2011 at 10:20 AM, Uma Maheswara Rao G 72686 <
> > maheswara@huawei.com> wrote:
> >
> > > You copy the same installations to new machine and change ip
> > address.> After that configure the new NN addresses to your
> > clients and DNs.
> > >
> > > >Also Does Namenode/JobTracker machine's configuration needs to
> > be better
> > > >than datanodes/tasktracker's ??
> > >  I did not get this question.
> > >
> > > Regards,
> > > Uma
> > >
> > > ----- Original Message -----
> > > From: praveenesh kumar <pr...@gmail.com>
> > > Date: Thursday, September 22, 2011 10:13 am
> > > Subject: Can we replace namenode machine with some other machine ?
> > > To: common-user@hadoop.apache.org
> > >
> > > > Hi all,
> > > >
> > > > Can we replace our namenode machine later with some other
> > machine. ?
> > > > Actually I got a new  server machine in my cluster and now I want
> > > > to make
> > > > this machine as my new namenode and jobtracker node ?
> > > > Also Does Namenode/JobTracker machine's configuration needs to be
> > > > betterthan datanodes/tasktracker's ??
> > > >
> > > > How can I achieve this target with least overhead ?
> > > >
> > > > Thanks,
> > > > Praveenesh
> > > >
> > >
> >
>

Re: Can we replace namenode machine with some other machine ?

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
By just changing the configs will not effect your data. You need to restart your DNs to connect to new NN.

For the second question:
 It will again depends on your usage. If your files will more in DFS then NN will consume more memory as it needs to store all the metadata info of the files in NameSpace.
 
 If your files are more and more then it is recommended that dont put the NN and JT in same machine.

Coming to DN case: Configured space will used for storing the block files.Once it is filled the space then NN will not select this DN for further writes. So, if one DN has less space should fine than less space for NN in big clusters.

Configuring good configuration DN which has very good amount of space. And NN has less space to store your files metadata info then its of no use to have more space in DNs right :-)
 
 
Regards,
Uma
----- Original Message -----
From: praveenesh kumar <pr...@gmail.com>
Date: Thursday, September 22, 2011 10:42 am
Subject: Re: Can we replace namenode machine with some other machine ?
To: common-user@hadoop.apache.org

> If I just change configuration settings in slave machines, Will it 
> effectany of the data that is currently residing in the cluster ??
> 
> And my second question was...
> Do we need the master node (NN/JT hosting machine) to have good
> configuration than our slave machines(DN/TT hosting machines).
> 
> Actually my master node is a weaker machine than my slave 
> machines,because I
> am assuming that master machines does not do much additional work, 
> and its
> okay to have a weak machine as master.
> Now I have a new big server machine just being added to my 
> cluster. So I am
> thinking shall I make this new machine as my new master(NN/JT) or 
> just add
> this machine as slave ?
> 
> Thanks,
> Praveenesh
> 
> 
> On Thu, Sep 22, 2011 at 10:20 AM, Uma Maheswara Rao G 72686 <
> maheswara@huawei.com> wrote:
> 
> > You copy the same installations to new machine and change ip 
> address.> After that configure the new NN addresses to your 
> clients and DNs.
> >
> > >Also Does Namenode/JobTracker machine's configuration needs to 
> be better
> > >than datanodes/tasktracker's ??
> >  I did not get this question.
> >
> > Regards,
> > Uma
> >
> > ----- Original Message -----
> > From: praveenesh kumar <pr...@gmail.com>
> > Date: Thursday, September 22, 2011 10:13 am
> > Subject: Can we replace namenode machine with some other machine ?
> > To: common-user@hadoop.apache.org
> >
> > > Hi all,
> > >
> > > Can we replace our namenode machine later with some other 
> machine. ?
> > > Actually I got a new  server machine in my cluster and now I want
> > > to make
> > > this machine as my new namenode and jobtracker node ?
> > > Also Does Namenode/JobTracker machine's configuration needs to be
> > > betterthan datanodes/tasktracker's ??
> > >
> > > How can I achieve this target with least overhead ?
> > >
> > > Thanks,
> > > Praveenesh
> > >
> >
> 

Re: Can we replace namenode machine with some other machine ?

Posted by praveenesh kumar <pr...@gmail.com>.
If I just change configuration settings in slave machines, Will it effect
any of the data that is currently residing in the cluster ??

And my second question was...
Do we need the master node (NN/JT hosting machine) to have good
configuration than our slave machines(DN/TT hosting machines).

Actually my master node is a weaker machine than my slave machines,because I
am assuming that master machines does not do much additional work, and its
okay to have a weak machine as master.
Now I have a new big server machine just being added to my cluster. So I am
thinking shall I make this new machine as my new master(NN/JT) or just add
this machine as slave ?

Thanks,
Praveenesh


On Thu, Sep 22, 2011 at 10:20 AM, Uma Maheswara Rao G 72686 <
maheswara@huawei.com> wrote:

> You copy the same installations to new machine and change ip address.
> After that configure the new NN addresses to your clients and DNs.
>
> >Also Does Namenode/JobTracker machine's configuration needs to be better
> >than datanodes/tasktracker's ??
>  I did not get this question.
>
> Regards,
> Uma
>
> ----- Original Message -----
> From: praveenesh kumar <pr...@gmail.com>
> Date: Thursday, September 22, 2011 10:13 am
> Subject: Can we replace namenode machine with some other machine ?
> To: common-user@hadoop.apache.org
>
> > Hi all,
> >
> > Can we replace our namenode machine later with some other machine. ?
> > Actually I got a new  server machine in my cluster and now I want
> > to make
> > this machine as my new namenode and jobtracker node ?
> > Also Does Namenode/JobTracker machine's configuration needs to be
> > betterthan datanodes/tasktracker's ??
> >
> > How can I achieve this target with least overhead ?
> >
> > Thanks,
> > Praveenesh
> >
>

Re: Can we replace namenode machine with some other machine ?

Posted by Uma Maheswara Rao G 72686 <ma...@huawei.com>.
You copy the same installations to new machine and change ip address.
After that configure the new NN addresses to your clients and DNs.

>Also Does Namenode/JobTracker machine's configuration needs to be better
>than datanodes/tasktracker's ??
 I did not get this question.

Regards,
Uma

----- Original Message -----
From: praveenesh kumar <pr...@gmail.com>
Date: Thursday, September 22, 2011 10:13 am
Subject: Can we replace namenode machine with some other machine ?
To: common-user@hadoop.apache.org

> Hi all,
> 
> Can we replace our namenode machine later with some other machine. ?
> Actually I got a new  server machine in my cluster and now I want 
> to make
> this machine as my new namenode and jobtracker node ?
> Also Does Namenode/JobTracker machine's configuration needs to be 
> betterthan datanodes/tasktracker's ??
> 
> How can I achieve this target with least overhead ?
> 
> Thanks,
> Praveenesh
> 

RE: Can we replace namenode machine with some other machine ?

Posted by Michael Segel <mi...@hotmail.com>.
Well you could do RAID 1 which is just mirroring.
I don't think you need to do any raid 0 or raid 5 (striping) to get better performance.
Also if you're using a 1U box, you just need 2 SATA drives internal and then NFS mount a drive from your SN for your backup copy...

> Date: Thu, 22 Sep 2011 17:18:55 +0100
> From: stevel@apache.org
> To: common-user@hadoop.apache.org
> Subject: Re: Can we replace namenode machine with some other machine ?
> 
> On 22/09/11 17:13, Michael Segel wrote:
> >
> > I agree w Steve except on one thing...
> >
> > RAID 5 Bad. RAID 10 (1+0) good.
> >
> > Sorry this goes back to my RDBMs days where RAID 5 will kill your performance and worse...
> >
> 
> sorry, I should have said RAID >=5. The main thing is you don't want the 
> NN data lost. ever	
> 
 		 	   		  

Re: Can we replace namenode machine with some other machine ?

Posted by Steve Loughran <st...@apache.org>.
On 22/09/11 17:13, Michael Segel wrote:
>
> I agree w Steve except on one thing...
>
> RAID 5 Bad. RAID 10 (1+0) good.
>
> Sorry this goes back to my RDBMs days where RAID 5 will kill your performance and worse...
>

sorry, I should have said RAID >=5. The main thing is you don't want the 
NN data lost. ever	


RE: Can we replace namenode machine with some other machine ?

Posted by Michael Segel <mi...@hotmail.com>.
I agree w Steve except on one thing...

RAID 5 Bad. RAID 10 (1+0) good.

Sorry this goes back to my RDBMs days where RAID 5 will kill your performance and worse...



> Date: Thu, 22 Sep 2011 11:28:39 +0100
> From: stevel@apache.org
> To: common-user@hadoop.apache.org
> Subject: Re: Can we replace namenode machine with some other machine ?
> 
> On 22/09/11 05:42, praveenesh kumar wrote:
> > Hi all,
> >
> > Can we replace our namenode machine later with some other machine. ?
> > Actually I got a new  server machine in my cluster and now I want to make
> > this machine as my new namenode and jobtracker node ?
> > Also Does Namenode/JobTracker machine's configuration needs to be better
> > than datanodes/tasktracker's ??
> >
> 
> 1. I'd give it lots of RAM - holding data about many files, avoiding 
> swapping, etc.
> 
> 2. I'd make sure the disks are RAID5, with some NFS-mounted FS that the 
> secondary namenode can talk to. avoids risk of loss of the index, which, 
> if it happens, renders your filesystem worthless. If I was really 
> paranoid I'd have twin raid controllers with separate connections to 
> disk arrays in separate racks, as [Jiang2008] shows that interconnect 
> problems on disk arrays can be higher than HDD failures.
> 
> 3. if your central switches are at 10 GbE, consider getting a 10GbE NIC 
> and hooking it up directly -this stops the network being the bottleneck, 
> though it does mean the server can have a lot more packets hitting it, 
> so putting more load on it.
> 
> 4. Leave space for a second CPU and time for GC tuning.
> 
> 
> JT's are less important; they need RAM but use HDFS for storage. If your 
> cluster is small, NN and JT can be run locally. If you do this, set up 
> DNS to have two hostnames to point to same network address. Then if you 
> ever split them off, everyone whose bookmark says http://jobtracker 
> won't notice
> 
> Either way: the NN and the JT are the machines whose availability you 
> care about. The rest is just a source of statistics you can look at later.
> 
> -Steve
> 
> 
> 
> [Jiang2008] "Are disks the dominant contributor for storage failures?: A 
> comprehensive study of storage subsystem failure characteristics". ACM 
> Transactions on Storage.
> 
 		 	   		  

Re: Can we replace namenode machine with some other machine ?

Posted by Steve Loughran <st...@apache.org>.
On 22/09/11 05:42, praveenesh kumar wrote:
> Hi all,
>
> Can we replace our namenode machine later with some other machine. ?
> Actually I got a new  server machine in my cluster and now I want to make
> this machine as my new namenode and jobtracker node ?
> Also Does Namenode/JobTracker machine's configuration needs to be better
> than datanodes/tasktracker's ??
>

1. I'd give it lots of RAM - holding data about many files, avoiding 
swapping, etc.

2. I'd make sure the disks are RAID5, with some NFS-mounted FS that the 
secondary namenode can talk to. avoids risk of loss of the index, which, 
if it happens, renders your filesystem worthless. If I was really 
paranoid I'd have twin raid controllers with separate connections to 
disk arrays in separate racks, as [Jiang2008] shows that interconnect 
problems on disk arrays can be higher than HDD failures.

3. if your central switches are at 10 GbE, consider getting a 10GbE NIC 
and hooking it up directly -this stops the network being the bottleneck, 
though it does mean the server can have a lot more packets hitting it, 
so putting more load on it.

4. Leave space for a second CPU and time for GC tuning.


JT's are less important; they need RAM but use HDFS for storage. If your 
cluster is small, NN and JT can be run locally. If you do this, set up 
DNS to have two hostnames to point to same network address. Then if you 
ever split them off, everyone whose bookmark says http://jobtracker 
won't notice

Either way: the NN and the JT are the machines whose availability you 
care about. The rest is just a source of statistics you can look at later.

-Steve



[Jiang2008] "Are disks the dominant contributor for storage failures?: A 
comprehensive study of storage subsystem failure characteristics". ACM 
Transactions on Storage.