You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by David Rosenstrauch <da...@darose.net> on 2012/08/16 00:11:04 UTC

OK to run data node on same machine as secondary name node?

I have a Hadoop cluster that's a little tight on resources.  I was 
thinking one way I could solve this could be by running an additional 
data node on the same machine as the secondary name node.

I wouldn't dare do that on the primary name node, since that machine 
needs to be extremely performant.  But since all the secondary name node 
does is doing a merge of the name node's checkpoint and logs, which is 
not an activity that require top-notch real-time performance, I thought 
it might not be a problem if I were to set up a data node running there 
as well.

Any reasons why that might be a bad idea?

Thanks,

DR

Re: OK to run data node on same machine as secondary name node?

Posted by James Brown <jb...@syndicate.net>.

It is ok as long as the Secondary NameNode runs on a machine physically 
separate from the NameNode.

Make sure the fs.checkpoint.dir and fs.checkpoint.edit.dir directory 
lists have multiple physical devices in each.


On 8/15/2012 3:11 PM, David Rosenstrauch wrote:
> I have a Hadoop cluster that's a little tight on resources.  I was
> thinking one way I could solve this could be by running an additional
> data node on the same machine as the secondary name node.
>
> I wouldn't dare do that on the primary name node, since that machine
> needs to be extremely performant.  But since all the secondary name node
> does is doing a merge of the name node's checkpoint and logs, which is
> not an activity that require top-notch real-time performance, I thought
> it might not be a problem if I were to set up a data node running there
> as well.
>
> Any reasons why that might be a bad idea?
>
> Thanks,
>
> DR
>

Re: OK to run data node on same machine as secondary name node?

Posted by David Rosenstrauch <da...@darose.net>.

Very helpful info.  I hadn't considered the bandwidth aspect of it. 
Thanks much, Harsh!

DR

On 08/16/2012 12:58 AM, Harsh J wrote:
> I'd not do this if the fsimage size is greater than, say, 5-6 GB. The
> SNN pulls and then pushes this back from the NameNode and the transfer
> can get heavy. If you have
> https://issues.apache.org/jira/browse/HDFS-1457 (image transfer
> throttler) in the version of Hadoop you use, you can set it to a
> proper value and keep the SNN on a slave node without worrying about
> it hogging all the available bandwidth.
>
> On Thu, Aug 16, 2012 at 3:41 AM, David Rosenstrauch <da...@darose.net> wrote:
>> I have a Hadoop cluster that's a little tight on resources.  I was thinking
>> one way I could solve this could be by running an additional data node on
>> the same machine as the secondary name node.
>>
>> I wouldn't dare do that on the primary name node, since that machine needs
>> to be extremely performant.  But since all the secondary name node does is
>> doing a merge of the name node's checkpoint and logs, which is not an
>> activity that require top-notch real-time performance, I thought it might
>> not be a problem if I were to set up a data node running there as well.
>>
>> Any reasons why that might be a bad idea?
>>
>> Thanks,
>>
>> DR
>
>
>

Re: OK to run data node on same machine as secondary name node?

Posted by David Rosenstrauch <da...@darose.net>.

Very helpful info.  I hadn't considered the bandwidth aspect of it. 
Thanks much, Harsh!

DR

On 08/16/2012 12:58 AM, Harsh J wrote:
> I'd not do this if the fsimage size is greater than, say, 5-6 GB. The
> SNN pulls and then pushes this back from the NameNode and the transfer
> can get heavy. If you have
> https://issues.apache.org/jira/browse/HDFS-1457 (image transfer
> throttler) in the version of Hadoop you use, you can set it to a
> proper value and keep the SNN on a slave node without worrying about
> it hogging all the available bandwidth.
>
> On Thu, Aug 16, 2012 at 3:41 AM, David Rosenstrauch <da...@darose.net> wrote:
>> I have a Hadoop cluster that's a little tight on resources.  I was thinking
>> one way I could solve this could be by running an additional data node on
>> the same machine as the secondary name node.
>>
>> I wouldn't dare do that on the primary name node, since that machine needs
>> to be extremely performant.  But since all the secondary name node does is
>> doing a merge of the name node's checkpoint and logs, which is not an
>> activity that require top-notch real-time performance, I thought it might
>> not be a problem if I were to set up a data node running there as well.
>>
>> Any reasons why that might be a bad idea?
>>
>> Thanks,
>>
>> DR
>
>
>

Re: OK to run data node on same machine as secondary name node?

Posted by David Rosenstrauch <da...@darose.net>.

Very helpful info.  I hadn't considered the bandwidth aspect of it. 
Thanks much, Harsh!

DR

On 08/16/2012 12:58 AM, Harsh J wrote:
> I'd not do this if the fsimage size is greater than, say, 5-6 GB. The
> SNN pulls and then pushes this back from the NameNode and the transfer
> can get heavy. If you have
> https://issues.apache.org/jira/browse/HDFS-1457 (image transfer
> throttler) in the version of Hadoop you use, you can set it to a
> proper value and keep the SNN on a slave node without worrying about
> it hogging all the available bandwidth.
>
> On Thu, Aug 16, 2012 at 3:41 AM, David Rosenstrauch <da...@darose.net> wrote:
>> I have a Hadoop cluster that's a little tight on resources.  I was thinking
>> one way I could solve this could be by running an additional data node on
>> the same machine as the secondary name node.
>>
>> I wouldn't dare do that on the primary name node, since that machine needs
>> to be extremely performant.  But since all the secondary name node does is
>> doing a merge of the name node's checkpoint and logs, which is not an
>> activity that require top-notch real-time performance, I thought it might
>> not be a problem if I were to set up a data node running there as well.
>>
>> Any reasons why that might be a bad idea?
>>
>> Thanks,
>>
>> DR
>
>
>

Re: OK to run data node on same machine as secondary name node?

Posted by David Rosenstrauch <da...@darose.net>.

Very helpful info.  I hadn't considered the bandwidth aspect of it. 
Thanks much, Harsh!

DR

On 08/16/2012 12:58 AM, Harsh J wrote:
> I'd not do this if the fsimage size is greater than, say, 5-6 GB. The
> SNN pulls and then pushes this back from the NameNode and the transfer
> can get heavy. If you have
> https://issues.apache.org/jira/browse/HDFS-1457 (image transfer
> throttler) in the version of Hadoop you use, you can set it to a
> proper value and keep the SNN on a slave node without worrying about
> it hogging all the available bandwidth.
>
> On Thu, Aug 16, 2012 at 3:41 AM, David Rosenstrauch <da...@darose.net> wrote:
>> I have a Hadoop cluster that's a little tight on resources.  I was thinking
>> one way I could solve this could be by running an additional data node on
>> the same machine as the secondary name node.
>>
>> I wouldn't dare do that on the primary name node, since that machine needs
>> to be extremely performant.  But since all the secondary name node does is
>> doing a merge of the name node's checkpoint and logs, which is not an
>> activity that require top-notch real-time performance, I thought it might
>> not be a problem if I were to set up a data node running there as well.
>>
>> Any reasons why that might be a bad idea?
>>
>> Thanks,
>>
>> DR
>
>
>

Re: OK to run data node on same machine as secondary name node?

Posted by Harsh J <ha...@cloudera.com>.

I'd not do this if the fsimage size is greater than, say, 5-6 GB. The
SNN pulls and then pushes this back from the NameNode and the transfer
can get heavy. If you have
https://issues.apache.org/jira/browse/HDFS-1457 (image transfer
throttler) in the version of Hadoop you use, you can set it to a
proper value and keep the SNN on a slave node without worrying about
it hogging all the available bandwidth.

On Thu, Aug 16, 2012 at 3:41 AM, David Rosenstrauch <da...@darose.net> wrote:
> I have a Hadoop cluster that's a little tight on resources.  I was thinking
> one way I could solve this could be by running an additional data node on
> the same machine as the secondary name node.
>
> I wouldn't dare do that on the primary name node, since that machine needs
> to be extremely performant.  But since all the secondary name node does is
> doing a merge of the name node's checkpoint and logs, which is not an
> activity that require top-notch real-time performance, I thought it might
> not be a problem if I were to set up a data node running there as well.
>
> Any reasons why that might be a bad idea?
>
> Thanks,
>
> DR

-- 
Harsh J

Re: OK to run data node on same machine as secondary name node?

Posted by James Brown <jb...@syndicate.net>.

It is ok as long as the Secondary NameNode runs on a machine physically 
separate from the NameNode.

Make sure the fs.checkpoint.dir and fs.checkpoint.edit.dir directory 
lists have multiple physical devices in each.


On 8/15/2012 3:11 PM, David Rosenstrauch wrote:
> I have a Hadoop cluster that's a little tight on resources.  I was
> thinking one way I could solve this could be by running an additional
> data node on the same machine as the secondary name node.
>
> I wouldn't dare do that on the primary name node, since that machine
> needs to be extremely performant.  But since all the secondary name node
> does is doing a merge of the name node's checkpoint and logs, which is
> not an activity that require top-notch real-time performance, I thought
> it might not be a problem if I were to set up a data node running there
> as well.
>
> Any reasons why that might be a bad idea?
>
> Thanks,
>
> DR
>

Re: OK to run data node on same machine as secondary name node?

Posted by Harsh J <ha...@cloudera.com>.

I'd not do this if the fsimage size is greater than, say, 5-6 GB. The
SNN pulls and then pushes this back from the NameNode and the transfer
can get heavy. If you have
https://issues.apache.org/jira/browse/HDFS-1457 (image transfer
throttler) in the version of Hadoop you use, you can set it to a
proper value and keep the SNN on a slave node without worrying about
it hogging all the available bandwidth.

On Thu, Aug 16, 2012 at 3:41 AM, David Rosenstrauch <da...@darose.net> wrote:
> I have a Hadoop cluster that's a little tight on resources.  I was thinking
> one way I could solve this could be by running an additional data node on
> the same machine as the secondary name node.
>
> I wouldn't dare do that on the primary name node, since that machine needs
> to be extremely performant.  But since all the secondary name node does is
> doing a merge of the name node's checkpoint and logs, which is not an
> activity that require top-notch real-time performance, I thought it might
> not be a problem if I were to set up a data node running there as well.
>
> Any reasons why that might be a bad idea?
>
> Thanks,
>
> DR

-- 
Harsh J

Re: OK to run data node on same machine as secondary name node?

Posted by Harsh J <ha...@cloudera.com>.

I'd not do this if the fsimage size is greater than, say, 5-6 GB. The
SNN pulls and then pushes this back from the NameNode and the transfer
can get heavy. If you have
https://issues.apache.org/jira/browse/HDFS-1457 (image transfer
throttler) in the version of Hadoop you use, you can set it to a
proper value and keep the SNN on a slave node without worrying about
it hogging all the available bandwidth.

On Thu, Aug 16, 2012 at 3:41 AM, David Rosenstrauch <da...@darose.net> wrote:
> I have a Hadoop cluster that's a little tight on resources.  I was thinking
> one way I could solve this could be by running an additional data node on
> the same machine as the secondary name node.
>
> I wouldn't dare do that on the primary name node, since that machine needs
> to be extremely performant.  But since all the secondary name node does is
> doing a merge of the name node's checkpoint and logs, which is not an
> activity that require top-notch real-time performance, I thought it might
> not be a problem if I were to set up a data node running there as well.
>
> Any reasons why that might be a bad idea?
>
> Thanks,
>
> DR

-- 
Harsh J

Re: OK to run data node on same machine as secondary name node?

Posted by Harsh J <ha...@cloudera.com>.

I'd not do this if the fsimage size is greater than, say, 5-6 GB. The
SNN pulls and then pushes this back from the NameNode and the transfer
can get heavy. If you have
https://issues.apache.org/jira/browse/HDFS-1457 (image transfer
throttler) in the version of Hadoop you use, you can set it to a
proper value and keep the SNN on a slave node without worrying about
it hogging all the available bandwidth.

On Thu, Aug 16, 2012 at 3:41 AM, David Rosenstrauch <da...@darose.net> wrote:
> I have a Hadoop cluster that's a little tight on resources.  I was thinking
> one way I could solve this could be by running an additional data node on
> the same machine as the secondary name node.
>
> I wouldn't dare do that on the primary name node, since that machine needs
> to be extremely performant.  But since all the secondary name node does is
> doing a merge of the name node's checkpoint and logs, which is not an
> activity that require top-notch real-time performance, I thought it might
> not be a problem if I were to set up a data node running there as well.
>
> Any reasons why that might be a bad idea?
>
> Thanks,
>
> DR

-- 
Harsh J

Re: OK to run data node on same machine as secondary name node?

Posted by James Brown <jb...@syndicate.net>.

It is ok as long as the Secondary NameNode runs on a machine physically 
separate from the NameNode.

Make sure the fs.checkpoint.dir and fs.checkpoint.edit.dir directory 
lists have multiple physical devices in each.


On 8/15/2012 3:11 PM, David Rosenstrauch wrote:
> I have a Hadoop cluster that's a little tight on resources.  I was
> thinking one way I could solve this could be by running an additional
> data node on the same machine as the secondary name node.
>
> I wouldn't dare do that on the primary name node, since that machine
> needs to be extremely performant.  But since all the secondary name node
> does is doing a merge of the name node's checkpoint and logs, which is
> not an activity that require top-notch real-time performance, I thought
> it might not be a problem if I were to set up a data node running there
> as well.
>
> Any reasons why that might be a bad idea?
>
> Thanks,
>
> DR
>

Re: OK to run data node on same machine as secondary name node?

Posted by James Brown <jb...@syndicate.net>.

It is ok as long as the Secondary NameNode runs on a machine physically 
separate from the NameNode.

Make sure the fs.checkpoint.dir and fs.checkpoint.edit.dir directory 
lists have multiple physical devices in each.


On 8/15/2012 3:11 PM, David Rosenstrauch wrote:
> I have a Hadoop cluster that's a little tight on resources.  I was
> thinking one way I could solve this could be by running an additional
> data node on the same machine as the secondary name node.
>
> I wouldn't dare do that on the primary name node, since that machine
> needs to be extremely performant.  But since all the secondary name node
> does is doing a merge of the name node's checkpoint and logs, which is
> not an activity that require top-notch real-time performance, I thought
> it might not be a problem if I were to set up a data node running there
> as well.
>
> Any reasons why that might be a bad idea?
>
> Thanks,
>
> DR
>