You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by charlie w <sp...@gmail.com> on 2011/01/22 01:36:10 UTC

starting secondary namenode, edits and edits.new already exist

We are running hadoop-0.20.1.  I did not set this cluster up, and the
person who did is unavailable, so I apologize for any of the following
that is unclear.

We would like to (re)start a secondary namenode, and I am looking for
guidance on how to do so.

We have secondary namenode, but it has apparently never been able to
contact the namenode.

Or so it seems.  The secondary namenode was never properly configured,
and that includes logging, so unable to see any kind of logging from
it.  The same unfortunate log configuration issue exists on the
namenode, and there is nothing to see there either.

On the secondary name node, there are some files in the checkpoint
directory but they don't seem to have any relationship to the files in
the namenode's name dir.  That all leads us to believe that there has
never been a checkpoint taken or attempted.

But  the namenode's name dir *does* contain both edits and edits.new
files.  There are, in fact, 5 files in there.  fsimage, fstime,
VERSION, edits and edits.new.  The edits file is only 4 bytes.
edits.new is very large, as the cluster's been running for quite a
while and has been at least somewhat active.

So now the questions.  Was there somehow a secondary name node that
was trying to make a checkpoint and failed, and that's why both edits
and edits.new exist?

If we restart the name node, it will properly merge both edits and
edits.new, correct?  From reading on the Jira and browsing the source
code a little, this is how I think it will happen.

Of course, the real question is how to get a secondary name node going
with as little risk as possible.  Should we just start up the
secondary name node?  Or should we restart the name node first?  Or is
there some other way for us to get right with our cluster?

Thanks,
Charlie