You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by jc...@pureperfect.com on 2012/08/16 00:53:57 UTC

RE: OK to run data node on same machine as secondary namenode?

Not an expert but... I think a lot of it depends on your usage pattern.


How many machines are we talking about? What is the replication factor?
If it's only two machines, there would need to be a datanode on both in
order to provide replication.


I guess you could also keep the secondary name node on the same machine
as the name node and just have the other(s) be a data node. After all,
that is the default configuration. It certainly makes administration
easier, since you would only need two machine images instead of three.
If the name node goes down you're hosed anyway. The down side would be
memory consumption on the name node, but recovery time would be faster.
It may not be the best configuration for write-heavy workloads and at
some point you are going to hit a hardware ceiling.



-------- Original Message --------
Subject: OK to run data node on same machine as secondary name node?
From: David Rosenstrauch <da...@darose.net>
Date: Wed, August 15, 2012 6:11 pm
To: user@hadoop.apache.org

I have a Hadoop cluster that's a little tight on resources. I was 
thinking one way I could solve this could be by running an additional 
data node on the same machine as the secondary name node.

I wouldn't dare do that on the primary name node, since that machine 
needs to be extremely performant. But since all the secondary name node 
does is doing a merge of the name node's checkpoint and logs, which is 
not an activity that require top-notch real-time performance, I thought 
it might not be a problem if I were to set up a data node running there 
as well.

Any reasons why that might be a bad idea?

Thanks,

DR

Re: OK to run data node on same machine as secondary name node?

Posted by Michael Segel <mi...@hotmail.com>.

Please keep in mind that you can run an entire cluster on a single server. (Pseudodistributed mode.)

Having said that, while you can do something doesn't mean its a good idea to do it. :-)

With respect to the secondary NN and DN on the same machine? Sure. If the machine has enough power, why not? 
However I would probably recommend not doing it. 

If you're running Apache based Hadoop, you would want to configure your NN, SN, JT nodes different from you DNs. 
But again, there's no reason why it can't be done....

On Aug 15, 2012, at 5:53 PM, jcfolsom@pureperfect.com wrote:

> 
> Not an expert but... I think a lot of it depends on your usage pattern.
> 
> 
> How many machines are we talking about? What is the replication factor?
> If it's only two machines, there would need to be a datanode on both in
> order to provide replication.
> 
> 
> I guess you could also keep the secondary name node on the same machine
> as the name node and just have the other(s) be a data node. After all,
> that is the default configuration. It certainly makes administration
> easier, since you would only need two machine images instead of three.
> If the name node goes down you're hosed anyway. The down side would be
> memory consumption on the name node, but recovery time would be faster.
> It may not be the best configuration for write-heavy workloads and at
> some point you are going to hit a hardware ceiling.
> 
> 
> 
> -------- Original Message --------
> Subject: OK to run data node on same machine as secondary name node?
> From: David Rosenstrauch <da...@darose.net>
> Date: Wed, August 15, 2012 6:11 pm
> To: user@hadoop.apache.org
> 
> I have a Hadoop cluster that's a little tight on resources. I was 
> thinking one way I could solve this could be by running an additional 
> data node on the same machine as the secondary name node.
> 
> I wouldn't dare do that on the primary name node, since that machine 
> needs to be extremely performant. But since all the secondary name node 
> does is doing a merge of the name node's checkpoint and logs, which is 
> not an activity that require top-notch real-time performance, I thought 
> it might not be a problem if I were to set up a data node running there 
> as well.
> 
> Any reasons why that might be a bad idea?
> 
> Thanks,
> 
> DR
>

Re: OK to run data node on same machine as secondary name node?

Posted by Michael Segel <mi...@hotmail.com>.

Please keep in mind that you can run an entire cluster on a single server. (Pseudodistributed mode.)

Having said that, while you can do something doesn't mean its a good idea to do it. :-)

With respect to the secondary NN and DN on the same machine? Sure. If the machine has enough power, why not? 
However I would probably recommend not doing it. 

If you're running Apache based Hadoop, you would want to configure your NN, SN, JT nodes different from you DNs. 
But again, there's no reason why it can't be done....

On Aug 15, 2012, at 5:53 PM, jcfolsom@pureperfect.com wrote:

> 
> Not an expert but... I think a lot of it depends on your usage pattern.
> 
> 
> How many machines are we talking about? What is the replication factor?
> If it's only two machines, there would need to be a datanode on both in
> order to provide replication.
> 
> 
> I guess you could also keep the secondary name node on the same machine
> as the name node and just have the other(s) be a data node. After all,
> that is the default configuration. It certainly makes administration
> easier, since you would only need two machine images instead of three.
> If the name node goes down you're hosed anyway. The down side would be
> memory consumption on the name node, but recovery time would be faster.
> It may not be the best configuration for write-heavy workloads and at
> some point you are going to hit a hardware ceiling.
> 
> 
> 
> -------- Original Message --------
> Subject: OK to run data node on same machine as secondary name node?
> From: David Rosenstrauch <da...@darose.net>
> Date: Wed, August 15, 2012 6:11 pm
> To: user@hadoop.apache.org
> 
> I have a Hadoop cluster that's a little tight on resources. I was 
> thinking one way I could solve this could be by running an additional 
> data node on the same machine as the secondary name node.
> 
> I wouldn't dare do that on the primary name node, since that machine 
> needs to be extremely performant. But since all the secondary name node 
> does is doing a merge of the name node's checkpoint and logs, which is 
> not an activity that require top-notch real-time performance, I thought 
> it might not be a problem if I were to set up a data node running there 
> as well.
> 
> Any reasons why that might be a bad idea?
> 
> Thanks,
> 
> DR
>

Re: OK to run data node on same machine as secondary name node?

Posted by Michael Segel <mi...@hotmail.com>.

Please keep in mind that you can run an entire cluster on a single server. (Pseudodistributed mode.)

Having said that, while you can do something doesn't mean its a good idea to do it. :-)

With respect to the secondary NN and DN on the same machine? Sure. If the machine has enough power, why not? 
However I would probably recommend not doing it. 

If you're running Apache based Hadoop, you would want to configure your NN, SN, JT nodes different from you DNs. 
But again, there's no reason why it can't be done....

On Aug 15, 2012, at 5:53 PM, jcfolsom@pureperfect.com wrote:

> 
> Not an expert but... I think a lot of it depends on your usage pattern.
> 
> 
> How many machines are we talking about? What is the replication factor?
> If it's only two machines, there would need to be a datanode on both in
> order to provide replication.
> 
> 
> I guess you could also keep the secondary name node on the same machine
> as the name node and just have the other(s) be a data node. After all,
> that is the default configuration. It certainly makes administration
> easier, since you would only need two machine images instead of three.
> If the name node goes down you're hosed anyway. The down side would be
> memory consumption on the name node, but recovery time would be faster.
> It may not be the best configuration for write-heavy workloads and at
> some point you are going to hit a hardware ceiling.
> 
> 
> 
> -------- Original Message --------
> Subject: OK to run data node on same machine as secondary name node?
> From: David Rosenstrauch <da...@darose.net>
> Date: Wed, August 15, 2012 6:11 pm
> To: user@hadoop.apache.org
> 
> I have a Hadoop cluster that's a little tight on resources. I was 
> thinking one way I could solve this could be by running an additional 
> data node on the same machine as the secondary name node.
> 
> I wouldn't dare do that on the primary name node, since that machine 
> needs to be extremely performant. But since all the secondary name node 
> does is doing a merge of the name node's checkpoint and logs, which is 
> not an activity that require top-notch real-time performance, I thought 
> it might not be a problem if I were to set up a data node running there 
> as well.
> 
> Any reasons why that might be a bad idea?
> 
> Thanks,
> 
> DR
>

Re: OK to run data node on same machine as secondary name node?

Posted by Michael Segel <mi...@hotmail.com>.

Please keep in mind that you can run an entire cluster on a single server. (Pseudodistributed mode.)

Having said that, while you can do something doesn't mean its a good idea to do it. :-)

With respect to the secondary NN and DN on the same machine? Sure. If the machine has enough power, why not? 
However I would probably recommend not doing it. 

If you're running Apache based Hadoop, you would want to configure your NN, SN, JT nodes different from you DNs. 
But again, there's no reason why it can't be done....

On Aug 15, 2012, at 5:53 PM, jcfolsom@pureperfect.com wrote:

> 
> Not an expert but... I think a lot of it depends on your usage pattern.
> 
> 
> How many machines are we talking about? What is the replication factor?
> If it's only two machines, there would need to be a datanode on both in
> order to provide replication.
> 
> 
> I guess you could also keep the secondary name node on the same machine
> as the name node and just have the other(s) be a data node. After all,
> that is the default configuration. It certainly makes administration
> easier, since you would only need two machine images instead of three.
> If the name node goes down you're hosed anyway. The down side would be
> memory consumption on the name node, but recovery time would be faster.
> It may not be the best configuration for write-heavy workloads and at
> some point you are going to hit a hardware ceiling.
> 
> 
> 
> -------- Original Message --------
> Subject: OK to run data node on same machine as secondary name node?
> From: David Rosenstrauch <da...@darose.net>
> Date: Wed, August 15, 2012 6:11 pm
> To: user@hadoop.apache.org
> 
> I have a Hadoop cluster that's a little tight on resources. I was 
> thinking one way I could solve this could be by running an additional 
> data node on the same machine as the secondary name node.
> 
> I wouldn't dare do that on the primary name node, since that machine 
> needs to be extremely performant. But since all the secondary name node 
> does is doing a merge of the name node's checkpoint and logs, which is 
> not an activity that require top-notch real-time performance, I thought 
> it might not be a problem if I were to set up a data node running there 
> as well.
> 
> Any reasons why that might be a bad idea?
> 
> Thanks,
> 
> DR
>