You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mayuran Yogarajah <ma...@casalemedia.com> on 2009/09/28 19:25:50 UTC

Is it OK to run with no secondary namenode?

We've got the namenode image being written to a second machine via
NFS so we have that backed up.  That said, do we still need a secondary
namenode, or is it OK to have the cluster going without one?

thanks

Re: Is it OK to run with no secondary namenode?

Posted by Mayuran Yogarajah <ma...@casalemedia.com>.
Aaron Kimball wrote:
> Quite possible. :\
> - A
>   

This is a bit odd.. we made this change yesterday and we're seeing this 
in the 2NN
log:

2009-10-07 19:16:21,225 WARN org.apache.hadoop.dfs.Storage: Checkpoint 
directory /data/hadoop/tmp/dfs/namesecondary is added.
2009-10-07 19:16:21,285 INFO org.apache.hadoop.dfs.NameNode.Secondary: 
Downloaded file fsimage size 4634347 bytes.
2009-10-07 19:16:21,286 INFO org.apache.hadoop.dfs.NameNode.Secondary: 
Downloaded file edits size 281 bytes.
2009-10-07 19:16:21,298 INFO org.apache.hadoop.fs.FSNamesystem: 
fsOwner=hadoop,hadoop
2009-10-07 19:16:21,298 INFO org.apache.hadoop.fs.FSNamesystem: 
supergroup=supergroup
2009-10-07 19:16:21,298 INFO org.apache.hadoop.fs.FSNamesystem: 
isPermissionEnabled=true
2009-10-07 19:16:21,299 INFO org.apache.hadoop.dfs.Storage: Number of 
files = 35445
2009-10-07 19:16:21,586 INFO org.apache.hadoop.dfs.Storage: Number of 
files under construction = 0
2009-10-07 19:16:21,589 INFO org.apache.hadoop.dfs.Storage: Edits file 
edits of size 281 edits # 5 loaded in 0 seconds.
2009-10-07 19:16:21,765 INFO org.apache.hadoop.dfs.Storage: Image file 
of size 4634508 saved in 0 seconds.
2009-10-07 19:16:21,938 INFO org.apache.hadoop.fs.FSNamesystem: Number 
of transactions: 0 Total time for transactions(ms): 0 Number of syncs: 0 
SyncTimes(ms): 0
2009-10-07 19:16:21,950 INFO org.apache.hadoop.dfs.NameNode.Secondary: 
Posted URL 
hadoop-master.com:50070putimage=1&port=50090&machine=127.0.0.1&token=-16:1244615693:0:1254868670000:1254868544203
2009-10-07 19:16:21,952 ERROR org.apache.hadoop.dfs.NameNode.Secondary: 
Exception in doCheckpoint:
2009-10-07 19:16:21,952 ERROR org.apache.hadoop.dfs.NameNode.Secondary: 
java.io.FileNotFoundException: 
http://hadoop-master.com:50070/getimage?putimage=1&port=50090&machine=127.0.0.1&token=-16:1244615693:0:1254868670000:1254868544203
        at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1303)

Anyone know whats going on? This has been going on all day.  Seems like 
it downloaded/merged the logs
OK, but then something went wrong afterwards =(

Does anyone know where I should be looking to fix this?

thanks,
M

Re: Is it OK to run with no secondary namenode?

Posted by Aaron Kimball <aa...@cloudera.com>.
Quite possible. :\
- A

On Thu, Oct 1, 2009 at 5:17 PM, Mayuran Yogarajah <
mayuran.yogarajah@casalemedia.com> wrote:

> Aaron Kimball wrote:
>
>> If you want to run the 2NN on a different node than the NN, then you need
>> to
>> set dfs.http.address on the 2NN to point to the namenode's http server
>> address. See
>>
>> http://www.cloudera.com/blog/2009/02/10/multi-host-secondarynamenode-configuration/
>>
>> - Aaron
>>
>>
>>
>
> Uhh this wasn't obvious, I totally missed it.  So I'm guessing the 2NN
> hasn't been able
> to upload the merged image back to the NN?
>
> thanks,
> M
>

Re: Is it OK to run with no secondary namenode?

Posted by Mayuran Yogarajah <ma...@casalemedia.com>.
Aaron Kimball wrote:
> If you want to run the 2NN on a different node than the NN, then you need to
> set dfs.http.address on the 2NN to point to the namenode's http server
> address. See
> http://www.cloudera.com/blog/2009/02/10/multi-host-secondarynamenode-configuration/
>
> - Aaron
>
>   

Uhh this wasn't obvious, I totally missed it.  So I'm guessing the 2NN 
hasn't been able
to upload the merged image back to the NN?

thanks,
M

Re: Is it OK to run with no secondary namenode?

Posted by Aaron Kimball <aa...@cloudera.com>.
If you want to run the 2NN on a different node than the NN, then you need to
set dfs.http.address on the 2NN to point to the namenode's http server
address. See
http://www.cloudera.com/blog/2009/02/10/multi-host-secondarynamenode-configuration/

- Aaron

On Mon, Sep 28, 2009 at 2:17 PM, Todd Lipcon <to...@cloudera.com> wrote:

> On Mon, Sep 28, 2009 at 11:10 AM, Mayuran Yogarajah <
> mayuran.yogarajah@casalemedia.com> wrote:
>
> > Hey Todd,
> >
> >  I don't personally like to use the slaves/masters files for managing
> which
> >> daemons run on which nodes. But, if you'd like to, it looks like you
> >> should
> >> put it in the "masters" file, not the slaves file. Look at how
> >> start-dfs.sh
> >> works to understand how those files are used.
> >>
> >> -Todd
> >>
> >>
> >
> > DOH, I meant to say masters, not slaves =(
> > If I may ask, how are you managing the various daemons?
> >
> >
> Using Cloudera's distribution of Hadoop, you can simply use linux init
> scripts to manage which daemons run on which nodes. For a large cluster,
> you'll want to use something like kickstart, cfengine, puppet, etc, to
> manage your configuration, and that includes which init scripts are
> enabled.
>
> -Todd
>

Re: Is it OK to run with no secondary namenode?

Posted by Todd Lipcon <to...@cloudera.com>.
On Mon, Sep 28, 2009 at 11:10 AM, Mayuran Yogarajah <
mayuran.yogarajah@casalemedia.com> wrote:

> Hey Todd,
>
>  I don't personally like to use the slaves/masters files for managing which
>> daemons run on which nodes. But, if you'd like to, it looks like you
>> should
>> put it in the "masters" file, not the slaves file. Look at how
>> start-dfs.sh
>> works to understand how those files are used.
>>
>> -Todd
>>
>>
>
> DOH, I meant to say masters, not slaves =(
> If I may ask, how are you managing the various daemons?
>
>
Using Cloudera's distribution of Hadoop, you can simply use linux init
scripts to manage which daemons run on which nodes. For a large cluster,
you'll want to use something like kickstart, cfengine, puppet, etc, to
manage your configuration, and that includes which init scripts are enabled.

-Todd

Re: Is it OK to run with no secondary namenode?

Posted by Mayuran Yogarajah <ma...@casalemedia.com>.
Hey Todd,

> I don't personally like to use the slaves/masters files for managing which
> daemons run on which nodes. But, if you'd like to, it looks like you should
> put it in the "masters" file, not the slaves file. Look at how start-dfs.sh
> works to understand how those files are used.
>
> -Todd
>   

DOH, I meant to say masters, not slaves =(
If I may ask, how are you managing the various daemons?

thanks

Re: Is it OK to run with no secondary namenode?

Posted by Todd Lipcon <to...@cloudera.com>.
On Mon, Sep 28, 2009 at 10:44 AM, Mayuran Yogarajah <
mayuran.yogarajah@casalemedia.com> wrote:

> Hey Todd,
>
>  Note that you do not need to run the 2NN on a separate machine *if* you
>> have
>> enough RAM for two entire copies of your filesystem namespace. For small
>> clusters you should be fine to run the two daemons on one machine.
>>
>>
>>
> Just wanted to confirm.. to set up the secondary NN I just need to add the
> hostname
> into the slaves file, correct?
>
>
I don't personally like to use the slaves/masters files for managing which
daemons run on which nodes. But, if you'd like to, it looks like you should
put it in the "masters" file, not the slaves file. Look at how start-dfs.sh
works to understand how those files are used.

-Todd

Re: Is it OK to run with no secondary namenode?

Posted by Mayuran Yogarajah <ma...@casalemedia.com>.
Hey Todd,

> Note that you do not need to run the 2NN on a separate machine *if* you have
> enough RAM for two entire copies of your filesystem namespace. For small
> clusters you should be fine to run the two daemons on one machine.
>
>   
Just wanted to confirm.. to set up the secondary NN I just need to add 
the hostname
into the slaves file, correct?

thanks again.

Re: Is it OK to run with no secondary namenode?

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Mayuran,

Yes, you need to run a secondary namenode.

The secondary namenode is *not* a backup mechanism. It is an important part
of the HDFS metadata system, and is responsible for periodically
checkpointing the filesystem namespace into a single file.

Without the secondary namenode running, the edit log of the NN will grow
without bound (unless you are periodically restarting your namenode, which
also causes a checkpoint.

Note that you do not need to run the 2NN on a separate machine *if* you have
enough RAM for two entire copies of your filesystem namespace. For small
clusters you should be fine to run the two daemons on one machine.

Hope that helps,
-Todd

On Mon, Sep 28, 2009 at 10:25 AM, Mayuran Yogarajah <
mayuran.yogarajah@casalemedia.com> wrote:

> We've got the namenode image being written to a second machine via
> NFS so we have that backed up.  That said, do we still need a secondary
> namenode, or is it OK to have the cluster going without one?
>
> thanks
>