You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Christopher Penney <cp...@gmail.com> on 2013/09/10 15:55:00 UTC

Securing the Secondary Name Node

Hi,

After hosting an insecure Hadoop environment for early testing I'm
transitioning to something more secure that would (hopefully) more or less
mirror what a production environment might look like.  I've integrated our
Hadoop cluster into our Kerberos realm and everything is working ok except
for our secondary name node.  When I invoke the secondarynamenode with
"-checkpoint force" (when no other secondary name node process is running)
I get:

13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful for
user hdfs/hpctest3.realm.com@REALM.COM using keytab file
/etc/hadoop/hdfs.keytab
13/09/10 09:44:25 INFO mortbay.log: Logging to
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog
13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter to
getimage
13/09/10 09:44:25 INFO http.HttpServer: Port returned by
webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
the listener on 50090
13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
50090 webServer.getConnectors()[0].getLocalPort() returned 50090
13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
hpctest3.realm.com@REALM.COM
13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Initialized,
principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
[/etc/hadoop/hdfs.keytab]
13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
configuration not set, using a random value as secret
13/09/10 09:44:26 INFO mortbay.log: Started
SelectChannelConnector@0.0.0.0:50090
13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server up
at: 0.0.0.0:50090
13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
:3600 secs (60 min)
13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
 :67108864 bytes (65536 KB)
13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
http://hpctest3.realm.com:50070/getimage?getimage=1
13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file fsimage
size 110 bytes.
13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
http://hpctest3.realm.com:50070/getimage?getedit=1
13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
size 40 bytes.
13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
hpctest3.realm.com@REALM.COM
13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
13/09/10 09:44:26 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring more
than 10 times
13/09/10 09:44:26 INFO common.Storage: Number of files = 1
13/09/10 09:44:26 INFO common.Storage: Number of files under construction =
0
13/09/10 09:44:26 INFO common.Storage: Edits file
/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
edits # 2 loaded in 0 seconds.
13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
Total time for transactions(ms): 0Number of transactions batched in Syncs:
0 Number of syncs: 0 SyncTimes(ms): 0
13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=40,
editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 40,
editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
seconds.
13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
hpctest3.realm.com:50070
putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint: Exception
trying to open authenticated connection to
http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:

The name node log file has this in it:

2013-09-10 09:44:26,219 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
removed>
2013-09-10 09:44:26,220 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
edits.new files already exists in all healthy directories:
  /tmp/hadoop/hdfs/name/current/edits.new
2013-09-10 09:44:26,327 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-10 09:44:26,362 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM

I've tried googling this and looking in a few books, but I really don't
find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
more info, but I didn't want to dump in a bunch of unnecessary stuff.

Thanks,

   Chris

Re: Securing the Secondary Name Node

Posted by Christopher Penney <cp...@gmail.com>.
It does seem to be some kind of authentication issue as I can see "Login
failure for null from keytab" in the NN log, but I don't understand why I
get it.  The NN and 2NN are the same box.  Below are the logs from each.

On the 2NN I see:

2013-09-20 10:01:59,338 INFO
org.apache.hadoop.security.UserGroupInformation: Initiating logout for hdfs/
hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,338 INFO
org.apache.hadoop.security.UserGroupInformation: Initiating re-login for
hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,410 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
transactions: 0 Total time for transactions(ms): 0Number of transactions
batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
2013-09-20 10:01:59,410 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log:
position=4,
editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,410 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate
to 4, editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,413 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to http://hpctest3.realm.com:50070/getimage?getimage=1
2013-09-20 10:01:59,448 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
fsimage size 429 bytes.
2013-09-20 10:01:59,448 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to http://hpctest3.realm.com:50070/getimage?getedit=1
2013-09-20 10:01:59,464 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
edits size 40 bytes.
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet: VM type
  = 64-bit
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet: 2% max
memory = 35.55625 MB
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet: capacity
 = 2^22 = 4194304 entries
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet:
recommended=4194304, actual=4194304
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hdfs/
hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
dfs.block.invalidate.limit=100
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isAccessTokenEnabled=true accessKeyUpdateInterval=600 min(s),
accessTokenLifetime=600 min(s)
2013-09-20 10:01:59,474 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
occuring more than 10 times
2013-09-20 10:01:59,475 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 4
2013-09-20 10:01:59,475 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 0
2013-09-20 10:01:59,475 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits of
size 40 edits # 2 loaded in 0 seconds.
2013-09-20 10:01:59,475 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
transactions: 0 Total time for transactions(ms): 0Number of transactions
batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
2013-09-20 10:01:59,505 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log:
position=40,
editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,505 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate
to 40, editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,506 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 463 saved in 0 seconds.
2013-09-20 10:01:59,577 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log:
position=4,
editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,579 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate
to 4, editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,688 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
hpctest3.realm.com:50070putimage=1&port=50090&machine=hpctest3.realm.com
&token=-32:832852997:0:1379525282000:1379524942772&newChecksum=d11af5836a07cda4d46f91aadaba11b0
2013-09-20 10:01:59,689 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=hpctest3.realm.com&token=-32:832852997:0:1379525282000:1379524942772&newChecksum=d11af5836a07cda4d46f91aadaba11b0
2013-09-20 10:01:59,706 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
doCheckpoint:
2013-09-20 10:01:59,706 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.io.FileNotFoundException:
http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=hpctest3.realm.com&token=-32:832852997:0:1379525282000:1379524942772&newChecksum=d11af5836a07cda4d46f91aadaba11b0
        at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
        at
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:172)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.putFSImage(SecondaryNameNode.java:435)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:499)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:369)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:327)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:337)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1118)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:324)
        at java.lang.Thread.run(Thread.java:662)

On the NN I see:

2013-09-20 10:01:59,412 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
19.1.2.223
2013-09-20 10:01:59,412 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
edits.new files already exists in all healthy directories:
  /tmp/hadoop/hdfs/name/current/edits.new
2013-09-20 10:01:59,447 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,463 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,704 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,705 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hdfs/hpctest3.realm.com@REALM.COM cause:java.io.IOException: Login
failure for null from keytab /etc/hadoop/hdfs.keytab
2013-09-20 10:01:59,706 WARN org.mortbay.log: /getimage:
java.io.IOException: GetImage failed. java.io.IOException: Login failure
for null from keytab /etc/hadoop/hdfs.keytab
 at
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:727)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.reloginIfNecessary(GetImageServlet.java:121)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:96)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:79)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:79)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
 at
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:372)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:848)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
 at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: javax.security.auth.login.LoginException: Unable to obtain
Princpal Name for authentication
 at
com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:733)
 at
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:629)
 at
com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:542)
 at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at javax.security.auth.login.LoginContext.invoke(LoginContext.java:769)
 at javax.security.auth.login.LoginContext.access$000(LoginContext.java:186)
 at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706)
 at java.security.AccessController.doPrivileged(Native Method)
 at
javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703)
 at javax.security.auth.login.LoginContext.login(LoginContext.java:575)
 at
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:716)
 ... 30 more

It's configured in hdfs-site.xml with:

<!-- NameNode security config -->
<property>
  <name>dfs.http.address</name>
  <value>hpctest3.realm.com:50070</value>
</property><property>
  <name>dfs.namenode.keytab.file</name>
  <value>/etc/hadoop/hdfs.keytab</value>
</property>
<property>
  <name>dfs.namenode.kerberos.principal</name>
  <value>hdfs/hpctest3.realm.com@REALM.COM</value>
</property>
<property>
  <name>dfs.namenode.kerberos.internal.spnego.principal</name>
  <value>HTTP/hpctest3.realm.com@REALM.COM</value>
</property>

<!-- Secondary NameNode security config -->
<property>
  <name>dfs.namenode.secondary.http-address</name>
  <value>hpctest3.realm.com:50090</value>
</property>
<property>
  <name>dfs.secondary.http.address</name>
  <value>hpctest3.realm.com:50090</value>
</property>
<property>
  <name>dfs.secondary.namenode.keytab.file</name>
  <value>/etc/hadoop/hdfs.keytab</value>
</property>
<property>
  <name>dfs.secondary.namenode.kerberos.principal</name>
  <value>hdfs/hpctest3.realm.com@REALM.COM</value>
</property>
<property>
  <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
  <value>HTTP/hpctest3.realm.com@REALM.COM</value>
</property>

In core-site.xml I do have:

<property>
  <name>hadoop.security.use-weak-http-crypto</name>
  <value>false</value>
</property>


Thanks,

   Chris



On Wed, Sep 18, 2013 at 1:01 AM, Benoy Antony <ba...@gmail.com> wrote:

> Chris ,
>
> I think that the error occurs when NN tries to download the fsimage from
> SNN.
> You can check the NN logs to make sure whether this is true.
>
> There could be different reasons for this.
>
> 1. NN fails to do SPNEGO with SNN.
> 2. NN's TGT expired. Unlikely in your test cluster.
>
> Please post with any additional log info and I can help.
>
> benoy
>
>
>
>
> On Thu, Sep 12, 2013 at 6:02 AM, Christopher Penney <cp...@gmail.com>wrote:
>
>> Does anyone have any suggestions or resources I might look at to resolve
>> this?  The documentation on setting up Kerberos seems pretty light.
>>
>>    Chris
>>
>>
>>
>> On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:
>>
>>>
>>> Hi,
>>>
>>> After hosting an insecure Hadoop environment for early testing I'm
>>> transitioning to something more secure that would (hopefully) more or less
>>> mirror what a production environment might look like.  I've integrated our
>>> Hadoop cluster into our Kerberos realm and everything is working ok except
>>> for our secondary name node.  When I invoke the secondarynamenode with
>>> "-checkpoint force" (when no other secondary name node process is running)
>>> I get:
>>>
>>> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful
>>> for user hdfs/hpctest3.realm.com@REALM.COM using keytab file
>>> /etc/hadoop/hdfs.keytab
>>> 13/09/10 09:44:25 INFO mortbay.log: Logging to
>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>>> org.mortbay.log.Slf4jLog
>>> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
>>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
>>> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter
>>> to getimage
>>> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
>>> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
>>> the listener on 50090
>>> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
>>> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
>>> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
>>> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
>>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
>>> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
>>> hpctest3.realm.com@REALM.COM
>>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler:
>>> Initialized, principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
>>> [/etc/hadoop/hdfs.keytab]
>>>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
>>> configuration not set, using a random value as secret
>>> 13/09/10 09:44:26 INFO mortbay.log: Started
>>> SelectChannelConnector@0.0.0.0:50090
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server
>>> up at: 0.0.0.0:50090
>>>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
>>> :3600 secs (60 min)
>>> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>>>  :67108864 bytes (65536 KB)
>>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>>> http://hpctest3.realm.com:50070/getimage?getimage=1
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file
>>> fsimage size 110 bytes.
>>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>>> http://hpctest3.realm.com:50070/getimage?getedit=1
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
>>> size 40 bytes.
>>> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
>>> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
>>> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
>>> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
>>> hpctest3.realm.com@REALM.COM
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
>>> dfs.block.invalidate.limit=100
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
>>> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
>>> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring
>>> more than 10 times
>>> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
>>> 13/09/10 09:44:26 INFO common.Storage: Number of files under
>>> construction = 0
>>> 13/09/10 09:44:26 INFO common.Storage: Edits file
>>> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
>>> edits # 2 loaded in 0 seconds.
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
>>> Total time for transactions(ms): 0Number of transactions batched in Syncs:
>>> 0 Number of syncs: 0 SyncTimes(ms): 0
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log:
>>> position=40,
>>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to
>>> 40, editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
>>> seconds.
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
>>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
>>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
>>> hpctest3.realm.com:50070
>>> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>>> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint:
>>> Exception trying to open authenticated connection to
>>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>>>
>>> The name node log file has this in it:
>>>
>>> 2013-09-10 09:44:26,219 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
>>> removed>
>>> 2013-09-10 09:44:26,220 WARN
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
>>> edits.new files already exists in all healthy directories:
>>>   /tmp/hadoop/hdfs/name/current/edits.new
>>> 2013-09-10 09:44:26,327 INFO
>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>>> 2013-09-10 09:44:26,362 INFO
>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>>>
>>> I've tried googling this and looking in a few books, but I really don't
>>> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
>>> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>>>
>>> Thanks,
>>>
>>>    Chris
>>>
>>
>>
>

Re: Securing the Secondary Name Node

Posted by Christopher Penney <cp...@gmail.com>.
It does seem to be some kind of authentication issue as I can see "Login
failure for null from keytab" in the NN log, but I don't understand why I
get it.  The NN and 2NN are the same box.  Below are the logs from each.

On the 2NN I see:

2013-09-20 10:01:59,338 INFO
org.apache.hadoop.security.UserGroupInformation: Initiating logout for hdfs/
hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,338 INFO
org.apache.hadoop.security.UserGroupInformation: Initiating re-login for
hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,410 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
transactions: 0 Total time for transactions(ms): 0Number of transactions
batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
2013-09-20 10:01:59,410 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log:
position=4,
editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,410 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate
to 4, editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,413 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to http://hpctest3.realm.com:50070/getimage?getimage=1
2013-09-20 10:01:59,448 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
fsimage size 429 bytes.
2013-09-20 10:01:59,448 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to http://hpctest3.realm.com:50070/getimage?getedit=1
2013-09-20 10:01:59,464 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
edits size 40 bytes.
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet: VM type
  = 64-bit
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet: 2% max
memory = 35.55625 MB
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet: capacity
 = 2^22 = 4194304 entries
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet:
recommended=4194304, actual=4194304
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hdfs/
hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
dfs.block.invalidate.limit=100
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isAccessTokenEnabled=true accessKeyUpdateInterval=600 min(s),
accessTokenLifetime=600 min(s)
2013-09-20 10:01:59,474 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
occuring more than 10 times
2013-09-20 10:01:59,475 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 4
2013-09-20 10:01:59,475 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 0
2013-09-20 10:01:59,475 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits of
size 40 edits # 2 loaded in 0 seconds.
2013-09-20 10:01:59,475 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
transactions: 0 Total time for transactions(ms): 0Number of transactions
batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
2013-09-20 10:01:59,505 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log:
position=40,
editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,505 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate
to 40, editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,506 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 463 saved in 0 seconds.
2013-09-20 10:01:59,577 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log:
position=4,
editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,579 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate
to 4, editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,688 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
hpctest3.realm.com:50070putimage=1&port=50090&machine=hpctest3.realm.com
&token=-32:832852997:0:1379525282000:1379524942772&newChecksum=d11af5836a07cda4d46f91aadaba11b0
2013-09-20 10:01:59,689 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=hpctest3.realm.com&token=-32:832852997:0:1379525282000:1379524942772&newChecksum=d11af5836a07cda4d46f91aadaba11b0
2013-09-20 10:01:59,706 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
doCheckpoint:
2013-09-20 10:01:59,706 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.io.FileNotFoundException:
http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=hpctest3.realm.com&token=-32:832852997:0:1379525282000:1379524942772&newChecksum=d11af5836a07cda4d46f91aadaba11b0
        at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
        at
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:172)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.putFSImage(SecondaryNameNode.java:435)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:499)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:369)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:327)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:337)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1118)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:324)
        at java.lang.Thread.run(Thread.java:662)

On the NN I see:

2013-09-20 10:01:59,412 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
19.1.2.223
2013-09-20 10:01:59,412 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
edits.new files already exists in all healthy directories:
  /tmp/hadoop/hdfs/name/current/edits.new
2013-09-20 10:01:59,447 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,463 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,704 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,705 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hdfs/hpctest3.realm.com@REALM.COM cause:java.io.IOException: Login
failure for null from keytab /etc/hadoop/hdfs.keytab
2013-09-20 10:01:59,706 WARN org.mortbay.log: /getimage:
java.io.IOException: GetImage failed. java.io.IOException: Login failure
for null from keytab /etc/hadoop/hdfs.keytab
 at
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:727)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.reloginIfNecessary(GetImageServlet.java:121)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:96)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:79)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:79)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
 at
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:372)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:848)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
 at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: javax.security.auth.login.LoginException: Unable to obtain
Princpal Name for authentication
 at
com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:733)
 at
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:629)
 at
com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:542)
 at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at javax.security.auth.login.LoginContext.invoke(LoginContext.java:769)
 at javax.security.auth.login.LoginContext.access$000(LoginContext.java:186)
 at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706)
 at java.security.AccessController.doPrivileged(Native Method)
 at
javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703)
 at javax.security.auth.login.LoginContext.login(LoginContext.java:575)
 at
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:716)
 ... 30 more

It's configured in hdfs-site.xml with:

<!-- NameNode security config -->
<property>
  <name>dfs.http.address</name>
  <value>hpctest3.realm.com:50070</value>
</property><property>
  <name>dfs.namenode.keytab.file</name>
  <value>/etc/hadoop/hdfs.keytab</value>
</property>
<property>
  <name>dfs.namenode.kerberos.principal</name>
  <value>hdfs/hpctest3.realm.com@REALM.COM</value>
</property>
<property>
  <name>dfs.namenode.kerberos.internal.spnego.principal</name>
  <value>HTTP/hpctest3.realm.com@REALM.COM</value>
</property>

<!-- Secondary NameNode security config -->
<property>
  <name>dfs.namenode.secondary.http-address</name>
  <value>hpctest3.realm.com:50090</value>
</property>
<property>
  <name>dfs.secondary.http.address</name>
  <value>hpctest3.realm.com:50090</value>
</property>
<property>
  <name>dfs.secondary.namenode.keytab.file</name>
  <value>/etc/hadoop/hdfs.keytab</value>
</property>
<property>
  <name>dfs.secondary.namenode.kerberos.principal</name>
  <value>hdfs/hpctest3.realm.com@REALM.COM</value>
</property>
<property>
  <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
  <value>HTTP/hpctest3.realm.com@REALM.COM</value>
</property>

In core-site.xml I do have:

<property>
  <name>hadoop.security.use-weak-http-crypto</name>
  <value>false</value>
</property>


Thanks,

   Chris



On Wed, Sep 18, 2013 at 1:01 AM, Benoy Antony <ba...@gmail.com> wrote:

> Chris ,
>
> I think that the error occurs when NN tries to download the fsimage from
> SNN.
> You can check the NN logs to make sure whether this is true.
>
> There could be different reasons for this.
>
> 1. NN fails to do SPNEGO with SNN.
> 2. NN's TGT expired. Unlikely in your test cluster.
>
> Please post with any additional log info and I can help.
>
> benoy
>
>
>
>
> On Thu, Sep 12, 2013 at 6:02 AM, Christopher Penney <cp...@gmail.com>wrote:
>
>> Does anyone have any suggestions or resources I might look at to resolve
>> this?  The documentation on setting up Kerberos seems pretty light.
>>
>>    Chris
>>
>>
>>
>> On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:
>>
>>>
>>> Hi,
>>>
>>> After hosting an insecure Hadoop environment for early testing I'm
>>> transitioning to something more secure that would (hopefully) more or less
>>> mirror what a production environment might look like.  I've integrated our
>>> Hadoop cluster into our Kerberos realm and everything is working ok except
>>> for our secondary name node.  When I invoke the secondarynamenode with
>>> "-checkpoint force" (when no other secondary name node process is running)
>>> I get:
>>>
>>> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful
>>> for user hdfs/hpctest3.realm.com@REALM.COM using keytab file
>>> /etc/hadoop/hdfs.keytab
>>> 13/09/10 09:44:25 INFO mortbay.log: Logging to
>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>>> org.mortbay.log.Slf4jLog
>>> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
>>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
>>> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter
>>> to getimage
>>> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
>>> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
>>> the listener on 50090
>>> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
>>> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
>>> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
>>> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
>>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
>>> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
>>> hpctest3.realm.com@REALM.COM
>>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler:
>>> Initialized, principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
>>> [/etc/hadoop/hdfs.keytab]
>>>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
>>> configuration not set, using a random value as secret
>>> 13/09/10 09:44:26 INFO mortbay.log: Started
>>> SelectChannelConnector@0.0.0.0:50090
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server
>>> up at: 0.0.0.0:50090
>>>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
>>> :3600 secs (60 min)
>>> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>>>  :67108864 bytes (65536 KB)
>>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>>> http://hpctest3.realm.com:50070/getimage?getimage=1
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file
>>> fsimage size 110 bytes.
>>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>>> http://hpctest3.realm.com:50070/getimage?getedit=1
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
>>> size 40 bytes.
>>> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
>>> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
>>> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
>>> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
>>> hpctest3.realm.com@REALM.COM
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
>>> dfs.block.invalidate.limit=100
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
>>> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
>>> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring
>>> more than 10 times
>>> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
>>> 13/09/10 09:44:26 INFO common.Storage: Number of files under
>>> construction = 0
>>> 13/09/10 09:44:26 INFO common.Storage: Edits file
>>> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
>>> edits # 2 loaded in 0 seconds.
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
>>> Total time for transactions(ms): 0Number of transactions batched in Syncs:
>>> 0 Number of syncs: 0 SyncTimes(ms): 0
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log:
>>> position=40,
>>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to
>>> 40, editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
>>> seconds.
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
>>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
>>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
>>> hpctest3.realm.com:50070
>>> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>>> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint:
>>> Exception trying to open authenticated connection to
>>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>>>
>>> The name node log file has this in it:
>>>
>>> 2013-09-10 09:44:26,219 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
>>> removed>
>>> 2013-09-10 09:44:26,220 WARN
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
>>> edits.new files already exists in all healthy directories:
>>>   /tmp/hadoop/hdfs/name/current/edits.new
>>> 2013-09-10 09:44:26,327 INFO
>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>>> 2013-09-10 09:44:26,362 INFO
>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>>>
>>> I've tried googling this and looking in a few books, but I really don't
>>> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
>>> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>>>
>>> Thanks,
>>>
>>>    Chris
>>>
>>
>>
>

Re: Securing the Secondary Name Node

Posted by Christopher Penney <cp...@gmail.com>.
It does seem to be some kind of authentication issue as I can see "Login
failure for null from keytab" in the NN log, but I don't understand why I
get it.  The NN and 2NN are the same box.  Below are the logs from each.

On the 2NN I see:

2013-09-20 10:01:59,338 INFO
org.apache.hadoop.security.UserGroupInformation: Initiating logout for hdfs/
hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,338 INFO
org.apache.hadoop.security.UserGroupInformation: Initiating re-login for
hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,410 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
transactions: 0 Total time for transactions(ms): 0Number of transactions
batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
2013-09-20 10:01:59,410 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log:
position=4,
editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,410 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate
to 4, editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,413 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to http://hpctest3.realm.com:50070/getimage?getimage=1
2013-09-20 10:01:59,448 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
fsimage size 429 bytes.
2013-09-20 10:01:59,448 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to http://hpctest3.realm.com:50070/getimage?getedit=1
2013-09-20 10:01:59,464 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
edits size 40 bytes.
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet: VM type
  = 64-bit
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet: 2% max
memory = 35.55625 MB
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet: capacity
 = 2^22 = 4194304 entries
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet:
recommended=4194304, actual=4194304
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hdfs/
hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
dfs.block.invalidate.limit=100
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isAccessTokenEnabled=true accessKeyUpdateInterval=600 min(s),
accessTokenLifetime=600 min(s)
2013-09-20 10:01:59,474 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
occuring more than 10 times
2013-09-20 10:01:59,475 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 4
2013-09-20 10:01:59,475 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 0
2013-09-20 10:01:59,475 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits of
size 40 edits # 2 loaded in 0 seconds.
2013-09-20 10:01:59,475 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
transactions: 0 Total time for transactions(ms): 0Number of transactions
batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
2013-09-20 10:01:59,505 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log:
position=40,
editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,505 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate
to 40, editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,506 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 463 saved in 0 seconds.
2013-09-20 10:01:59,577 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log:
position=4,
editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,579 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate
to 4, editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,688 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
hpctest3.realm.com:50070putimage=1&port=50090&machine=hpctest3.realm.com
&token=-32:832852997:0:1379525282000:1379524942772&newChecksum=d11af5836a07cda4d46f91aadaba11b0
2013-09-20 10:01:59,689 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=hpctest3.realm.com&token=-32:832852997:0:1379525282000:1379524942772&newChecksum=d11af5836a07cda4d46f91aadaba11b0
2013-09-20 10:01:59,706 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
doCheckpoint:
2013-09-20 10:01:59,706 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.io.FileNotFoundException:
http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=hpctest3.realm.com&token=-32:832852997:0:1379525282000:1379524942772&newChecksum=d11af5836a07cda4d46f91aadaba11b0
        at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
        at
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:172)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.putFSImage(SecondaryNameNode.java:435)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:499)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:369)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:327)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:337)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1118)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:324)
        at java.lang.Thread.run(Thread.java:662)

On the NN I see:

2013-09-20 10:01:59,412 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
19.1.2.223
2013-09-20 10:01:59,412 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
edits.new files already exists in all healthy directories:
  /tmp/hadoop/hdfs/name/current/edits.new
2013-09-20 10:01:59,447 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,463 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,704 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,705 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hdfs/hpctest3.realm.com@REALM.COM cause:java.io.IOException: Login
failure for null from keytab /etc/hadoop/hdfs.keytab
2013-09-20 10:01:59,706 WARN org.mortbay.log: /getimage:
java.io.IOException: GetImage failed. java.io.IOException: Login failure
for null from keytab /etc/hadoop/hdfs.keytab
 at
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:727)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.reloginIfNecessary(GetImageServlet.java:121)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:96)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:79)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:79)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
 at
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:372)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:848)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
 at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: javax.security.auth.login.LoginException: Unable to obtain
Princpal Name for authentication
 at
com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:733)
 at
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:629)
 at
com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:542)
 at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at javax.security.auth.login.LoginContext.invoke(LoginContext.java:769)
 at javax.security.auth.login.LoginContext.access$000(LoginContext.java:186)
 at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706)
 at java.security.AccessController.doPrivileged(Native Method)
 at
javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703)
 at javax.security.auth.login.LoginContext.login(LoginContext.java:575)
 at
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:716)
 ... 30 more

It's configured in hdfs-site.xml with:

<!-- NameNode security config -->
<property>
  <name>dfs.http.address</name>
  <value>hpctest3.realm.com:50070</value>
</property><property>
  <name>dfs.namenode.keytab.file</name>
  <value>/etc/hadoop/hdfs.keytab</value>
</property>
<property>
  <name>dfs.namenode.kerberos.principal</name>
  <value>hdfs/hpctest3.realm.com@REALM.COM</value>
</property>
<property>
  <name>dfs.namenode.kerberos.internal.spnego.principal</name>
  <value>HTTP/hpctest3.realm.com@REALM.COM</value>
</property>

<!-- Secondary NameNode security config -->
<property>
  <name>dfs.namenode.secondary.http-address</name>
  <value>hpctest3.realm.com:50090</value>
</property>
<property>
  <name>dfs.secondary.http.address</name>
  <value>hpctest3.realm.com:50090</value>
</property>
<property>
  <name>dfs.secondary.namenode.keytab.file</name>
  <value>/etc/hadoop/hdfs.keytab</value>
</property>
<property>
  <name>dfs.secondary.namenode.kerberos.principal</name>
  <value>hdfs/hpctest3.realm.com@REALM.COM</value>
</property>
<property>
  <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
  <value>HTTP/hpctest3.realm.com@REALM.COM</value>
</property>

In core-site.xml I do have:

<property>
  <name>hadoop.security.use-weak-http-crypto</name>
  <value>false</value>
</property>


Thanks,

   Chris



On Wed, Sep 18, 2013 at 1:01 AM, Benoy Antony <ba...@gmail.com> wrote:

> Chris ,
>
> I think that the error occurs when NN tries to download the fsimage from
> SNN.
> You can check the NN logs to make sure whether this is true.
>
> There could be different reasons for this.
>
> 1. NN fails to do SPNEGO with SNN.
> 2. NN's TGT expired. Unlikely in your test cluster.
>
> Please post with any additional log info and I can help.
>
> benoy
>
>
>
>
> On Thu, Sep 12, 2013 at 6:02 AM, Christopher Penney <cp...@gmail.com>wrote:
>
>> Does anyone have any suggestions or resources I might look at to resolve
>> this?  The documentation on setting up Kerberos seems pretty light.
>>
>>    Chris
>>
>>
>>
>> On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:
>>
>>>
>>> Hi,
>>>
>>> After hosting an insecure Hadoop environment for early testing I'm
>>> transitioning to something more secure that would (hopefully) more or less
>>> mirror what a production environment might look like.  I've integrated our
>>> Hadoop cluster into our Kerberos realm and everything is working ok except
>>> for our secondary name node.  When I invoke the secondarynamenode with
>>> "-checkpoint force" (when no other secondary name node process is running)
>>> I get:
>>>
>>> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful
>>> for user hdfs/hpctest3.realm.com@REALM.COM using keytab file
>>> /etc/hadoop/hdfs.keytab
>>> 13/09/10 09:44:25 INFO mortbay.log: Logging to
>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>>> org.mortbay.log.Slf4jLog
>>> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
>>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
>>> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter
>>> to getimage
>>> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
>>> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
>>> the listener on 50090
>>> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
>>> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
>>> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
>>> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
>>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
>>> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
>>> hpctest3.realm.com@REALM.COM
>>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler:
>>> Initialized, principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
>>> [/etc/hadoop/hdfs.keytab]
>>>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
>>> configuration not set, using a random value as secret
>>> 13/09/10 09:44:26 INFO mortbay.log: Started
>>> SelectChannelConnector@0.0.0.0:50090
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server
>>> up at: 0.0.0.0:50090
>>>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
>>> :3600 secs (60 min)
>>> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>>>  :67108864 bytes (65536 KB)
>>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>>> http://hpctest3.realm.com:50070/getimage?getimage=1
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file
>>> fsimage size 110 bytes.
>>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>>> http://hpctest3.realm.com:50070/getimage?getedit=1
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
>>> size 40 bytes.
>>> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
>>> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
>>> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
>>> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
>>> hpctest3.realm.com@REALM.COM
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
>>> dfs.block.invalidate.limit=100
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
>>> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
>>> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring
>>> more than 10 times
>>> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
>>> 13/09/10 09:44:26 INFO common.Storage: Number of files under
>>> construction = 0
>>> 13/09/10 09:44:26 INFO common.Storage: Edits file
>>> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
>>> edits # 2 loaded in 0 seconds.
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
>>> Total time for transactions(ms): 0Number of transactions batched in Syncs:
>>> 0 Number of syncs: 0 SyncTimes(ms): 0
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log:
>>> position=40,
>>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to
>>> 40, editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
>>> seconds.
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
>>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
>>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
>>> hpctest3.realm.com:50070
>>> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>>> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint:
>>> Exception trying to open authenticated connection to
>>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>>>
>>> The name node log file has this in it:
>>>
>>> 2013-09-10 09:44:26,219 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
>>> removed>
>>> 2013-09-10 09:44:26,220 WARN
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
>>> edits.new files already exists in all healthy directories:
>>>   /tmp/hadoop/hdfs/name/current/edits.new
>>> 2013-09-10 09:44:26,327 INFO
>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>>> 2013-09-10 09:44:26,362 INFO
>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>>>
>>> I've tried googling this and looking in a few books, but I really don't
>>> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
>>> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>>>
>>> Thanks,
>>>
>>>    Chris
>>>
>>
>>
>

Re: Securing the Secondary Name Node

Posted by Christopher Penney <cp...@gmail.com>.
It does seem to be some kind of authentication issue as I can see "Login
failure for null from keytab" in the NN log, but I don't understand why I
get it.  The NN and 2NN are the same box.  Below are the logs from each.

On the 2NN I see:

2013-09-20 10:01:59,338 INFO
org.apache.hadoop.security.UserGroupInformation: Initiating logout for hdfs/
hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,338 INFO
org.apache.hadoop.security.UserGroupInformation: Initiating re-login for
hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,410 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
transactions: 0 Total time for transactions(ms): 0Number of transactions
batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
2013-09-20 10:01:59,410 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log:
position=4,
editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,410 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate
to 4, editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,413 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to http://hpctest3.realm.com:50070/getimage?getimage=1
2013-09-20 10:01:59,448 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
fsimage size 429 bytes.
2013-09-20 10:01:59,448 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to http://hpctest3.realm.com:50070/getimage?getedit=1
2013-09-20 10:01:59,464 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
edits size 40 bytes.
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet: VM type
  = 64-bit
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet: 2% max
memory = 35.55625 MB
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet: capacity
 = 2^22 = 4194304 entries
2013-09-20 10:01:59,464 INFO org.apache.hadoop.hdfs.util.GSet:
recommended=4194304, actual=4194304
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hdfs/
hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
dfs.block.invalidate.limit=100
2013-09-20 10:01:59,473 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isAccessTokenEnabled=true accessKeyUpdateInterval=600 min(s),
accessTokenLifetime=600 min(s)
2013-09-20 10:01:59,474 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
occuring more than 10 times
2013-09-20 10:01:59,475 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 4
2013-09-20 10:01:59,475 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 0
2013-09-20 10:01:59,475 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits of
size 40 edits # 2 loaded in 0 seconds.
2013-09-20 10:01:59,475 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
transactions: 0 Total time for transactions(ms): 0Number of transactions
batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
2013-09-20 10:01:59,505 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log:
position=40,
editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,505 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate
to 40, editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,506 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 463 saved in 0 seconds.
2013-09-20 10:01:59,577 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: closing edit log:
position=4,
editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,579 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLog: close success: truncate
to 4, editlog=/tmp/hadoop/tmp/hadoop-hdfs/dfs/namesecondary/current/edits
2013-09-20 10:01:59,688 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
hpctest3.realm.com:50070putimage=1&port=50090&machine=hpctest3.realm.com
&token=-32:832852997:0:1379525282000:1379524942772&newChecksum=d11af5836a07cda4d46f91aadaba11b0
2013-09-20 10:01:59,689 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=hpctest3.realm.com&token=-32:832852997:0:1379525282000:1379524942772&newChecksum=d11af5836a07cda4d46f91aadaba11b0
2013-09-20 10:01:59,706 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
doCheckpoint:
2013-09-20 10:01:59,706 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.io.FileNotFoundException:
http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=hpctest3.realm.com&token=-32:832852997:0:1379525282000:1379524942772&newChecksum=d11af5836a07cda4d46f91aadaba11b0
        at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
        at
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:172)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.putFSImage(SecondaryNameNode.java:435)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:499)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:369)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:327)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:337)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1118)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:324)
        at java.lang.Thread.run(Thread.java:662)

On the NN I see:

2013-09-20 10:01:59,412 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
19.1.2.223
2013-09-20 10:01:59,412 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
edits.new files already exists in all healthy directories:
  /tmp/hadoop/hdfs/name/current/edits.new
2013-09-20 10:01:59,447 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,463 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,704 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 10:01:59,705 ERROR
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:hdfs/hpctest3.realm.com@REALM.COM cause:java.io.IOException: Login
failure for null from keytab /etc/hadoop/hdfs.keytab
2013-09-20 10:01:59,706 WARN org.mortbay.log: /getimage:
java.io.IOException: GetImage failed. java.io.IOException: Login failure
for null from keytab /etc/hadoop/hdfs.keytab
 at
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:727)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.reloginIfNecessary(GetImageServlet.java:121)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:96)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:79)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
 at
org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:79)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
 at
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:372)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:848)
 at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
 at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: javax.security.auth.login.LoginException: Unable to obtain
Princpal Name for authentication
 at
com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:733)
 at
com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:629)
 at
com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:542)
 at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at javax.security.auth.login.LoginContext.invoke(LoginContext.java:769)
 at javax.security.auth.login.LoginContext.access$000(LoginContext.java:186)
 at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706)
 at java.security.AccessController.doPrivileged(Native Method)
 at
javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703)
 at javax.security.auth.login.LoginContext.login(LoginContext.java:575)
 at
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:716)
 ... 30 more

It's configured in hdfs-site.xml with:

<!-- NameNode security config -->
<property>
  <name>dfs.http.address</name>
  <value>hpctest3.realm.com:50070</value>
</property><property>
  <name>dfs.namenode.keytab.file</name>
  <value>/etc/hadoop/hdfs.keytab</value>
</property>
<property>
  <name>dfs.namenode.kerberos.principal</name>
  <value>hdfs/hpctest3.realm.com@REALM.COM</value>
</property>
<property>
  <name>dfs.namenode.kerberos.internal.spnego.principal</name>
  <value>HTTP/hpctest3.realm.com@REALM.COM</value>
</property>

<!-- Secondary NameNode security config -->
<property>
  <name>dfs.namenode.secondary.http-address</name>
  <value>hpctest3.realm.com:50090</value>
</property>
<property>
  <name>dfs.secondary.http.address</name>
  <value>hpctest3.realm.com:50090</value>
</property>
<property>
  <name>dfs.secondary.namenode.keytab.file</name>
  <value>/etc/hadoop/hdfs.keytab</value>
</property>
<property>
  <name>dfs.secondary.namenode.kerberos.principal</name>
  <value>hdfs/hpctest3.realm.com@REALM.COM</value>
</property>
<property>
  <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
  <value>HTTP/hpctest3.realm.com@REALM.COM</value>
</property>

In core-site.xml I do have:

<property>
  <name>hadoop.security.use-weak-http-crypto</name>
  <value>false</value>
</property>


Thanks,

   Chris



On Wed, Sep 18, 2013 at 1:01 AM, Benoy Antony <ba...@gmail.com> wrote:

> Chris ,
>
> I think that the error occurs when NN tries to download the fsimage from
> SNN.
> You can check the NN logs to make sure whether this is true.
>
> There could be different reasons for this.
>
> 1. NN fails to do SPNEGO with SNN.
> 2. NN's TGT expired. Unlikely in your test cluster.
>
> Please post with any additional log info and I can help.
>
> benoy
>
>
>
>
> On Thu, Sep 12, 2013 at 6:02 AM, Christopher Penney <cp...@gmail.com>wrote:
>
>> Does anyone have any suggestions or resources I might look at to resolve
>> this?  The documentation on setting up Kerberos seems pretty light.
>>
>>    Chris
>>
>>
>>
>> On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:
>>
>>>
>>> Hi,
>>>
>>> After hosting an insecure Hadoop environment for early testing I'm
>>> transitioning to something more secure that would (hopefully) more or less
>>> mirror what a production environment might look like.  I've integrated our
>>> Hadoop cluster into our Kerberos realm and everything is working ok except
>>> for our secondary name node.  When I invoke the secondarynamenode with
>>> "-checkpoint force" (when no other secondary name node process is running)
>>> I get:
>>>
>>> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful
>>> for user hdfs/hpctest3.realm.com@REALM.COM using keytab file
>>> /etc/hadoop/hdfs.keytab
>>> 13/09/10 09:44:25 INFO mortbay.log: Logging to
>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>>> org.mortbay.log.Slf4jLog
>>> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
>>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
>>> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter
>>> to getimage
>>> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
>>> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
>>> the listener on 50090
>>> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
>>> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
>>> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
>>> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
>>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
>>> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
>>> hpctest3.realm.com@REALM.COM
>>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler:
>>> Initialized, principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
>>> [/etc/hadoop/hdfs.keytab]
>>>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
>>> configuration not set, using a random value as secret
>>> 13/09/10 09:44:26 INFO mortbay.log: Started
>>> SelectChannelConnector@0.0.0.0:50090
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server
>>> up at: 0.0.0.0:50090
>>>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
>>> :3600 secs (60 min)
>>> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>>>  :67108864 bytes (65536 KB)
>>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>>> http://hpctest3.realm.com:50070/getimage?getimage=1
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file
>>> fsimage size 110 bytes.
>>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>>> http://hpctest3.realm.com:50070/getimage?getedit=1
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
>>> size 40 bytes.
>>> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
>>> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
>>> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
>>> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
>>> hpctest3.realm.com@REALM.COM
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
>>> dfs.block.invalidate.limit=100
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
>>> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
>>> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring
>>> more than 10 times
>>> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
>>> 13/09/10 09:44:26 INFO common.Storage: Number of files under
>>> construction = 0
>>> 13/09/10 09:44:26 INFO common.Storage: Edits file
>>> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
>>> edits # 2 loaded in 0 seconds.
>>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
>>> Total time for transactions(ms): 0Number of transactions batched in Syncs:
>>> 0 Number of syncs: 0 SyncTimes(ms): 0
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log:
>>> position=40,
>>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to
>>> 40, editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
>>> seconds.
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
>>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
>>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
>>> hpctest3.realm.com:50070
>>> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>>> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint:
>>> Exception trying to open authenticated connection to
>>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>>>
>>> The name node log file has this in it:
>>>
>>> 2013-09-10 09:44:26,219 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
>>> removed>
>>> 2013-09-10 09:44:26,220 WARN
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
>>> edits.new files already exists in all healthy directories:
>>>   /tmp/hadoop/hdfs/name/current/edits.new
>>> 2013-09-10 09:44:26,327 INFO
>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>>> 2013-09-10 09:44:26,362 INFO
>>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>>>
>>> I've tried googling this and looking in a few books, but I really don't
>>> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
>>> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>>>
>>> Thanks,
>>>
>>>    Chris
>>>
>>
>>
>

Re: Securing the Secondary Name Node

Posted by Benoy Antony <ba...@gmail.com>.
Chris ,

I think that the error occurs when NN tries to download the fsimage from
SNN.
You can check the NN logs to make sure whether this is true.

There could be different reasons for this.

1. NN fails to do SPNEGO with SNN.
2. NN's TGT expired. Unlikely in your test cluster.

Please post with any additional log info and I can help.

benoy




On Thu, Sep 12, 2013 at 6:02 AM, Christopher Penney <cp...@gmail.com>wrote:

> Does anyone have any suggestions or resources I might look at to resolve
> this?  The documentation on setting up Kerberos seems pretty light.
>
>    Chris
>
>
>
> On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:
>
>>
>> Hi,
>>
>> After hosting an insecure Hadoop environment for early testing I'm
>> transitioning to something more secure that would (hopefully) more or less
>> mirror what a production environment might look like.  I've integrated our
>> Hadoop cluster into our Kerberos realm and everything is working ok except
>> for our secondary name node.  When I invoke the secondarynamenode with
>> "-checkpoint force" (when no other secondary name node process is running)
>> I get:
>>
>> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful
>> for user hdfs/hpctest3.realm.com@REALM.COM using keytab file
>> /etc/hadoop/hdfs.keytab
>> 13/09/10 09:44:25 INFO mortbay.log: Logging to
>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>> org.mortbay.log.Slf4jLog
>> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
>> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter
>> to getimage
>> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
>> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
>> the listener on 50090
>> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
>> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
>> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
>> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
>> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
>> hpctest3.realm.com@REALM.COM
>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Initialized,
>> principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
>> [/etc/hadoop/hdfs.keytab]
>>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
>> configuration not set, using a random value as secret
>> 13/09/10 09:44:26 INFO mortbay.log: Started
>> SelectChannelConnector@0.0.0.0:50090
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server
>> up at: 0.0.0.0:50090
>>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
>> :3600 secs (60 min)
>> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>>  :67108864 bytes (65536 KB)
>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>> http://hpctest3.realm.com:50070/getimage?getimage=1
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file
>> fsimage size 110 bytes.
>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>> http://hpctest3.realm.com:50070/getimage?getedit=1
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
>> size 40 bytes.
>> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
>> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
>> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
>> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
>> hpctest3.realm.com@REALM.COM
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
>> dfs.block.invalidate.limit=100
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
>> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
>> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring
>> more than 10 times
>> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
>> 13/09/10 09:44:26 INFO common.Storage: Number of files under construction
>> = 0
>> 13/09/10 09:44:26 INFO common.Storage: Edits file
>> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
>> edits # 2 loaded in 0 seconds.
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
>> Total time for transactions(ms): 0Number of transactions batched in Syncs:
>> 0 Number of syncs: 0 SyncTimes(ms): 0
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=40,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 40,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
>> seconds.
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
>> hpctest3.realm.com:50070
>> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint: Exception
>> trying to open authenticated connection to
>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>>
>> The name node log file has this in it:
>>
>> 2013-09-10 09:44:26,219 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
>> removed>
>> 2013-09-10 09:44:26,220 WARN
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
>> edits.new files already exists in all healthy directories:
>>   /tmp/hadoop/hdfs/name/current/edits.new
>> 2013-09-10 09:44:26,327 INFO
>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>> 2013-09-10 09:44:26,362 INFO
>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>>
>> I've tried googling this and looking in a few books, but I really don't
>> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
>> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>>
>> Thanks,
>>
>>    Chris
>>
>
>

Re: Securing the Secondary Name Node

Posted by Benoy Antony <ba...@gmail.com>.
Chris ,

I think that the error occurs when NN tries to download the fsimage from
SNN.
You can check the NN logs to make sure whether this is true.

There could be different reasons for this.

1. NN fails to do SPNEGO with SNN.
2. NN's TGT expired. Unlikely in your test cluster.

Please post with any additional log info and I can help.

benoy




On Thu, Sep 12, 2013 at 6:02 AM, Christopher Penney <cp...@gmail.com>wrote:

> Does anyone have any suggestions or resources I might look at to resolve
> this?  The documentation on setting up Kerberos seems pretty light.
>
>    Chris
>
>
>
> On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:
>
>>
>> Hi,
>>
>> After hosting an insecure Hadoop environment for early testing I'm
>> transitioning to something more secure that would (hopefully) more or less
>> mirror what a production environment might look like.  I've integrated our
>> Hadoop cluster into our Kerberos realm and everything is working ok except
>> for our secondary name node.  When I invoke the secondarynamenode with
>> "-checkpoint force" (when no other secondary name node process is running)
>> I get:
>>
>> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful
>> for user hdfs/hpctest3.realm.com@REALM.COM using keytab file
>> /etc/hadoop/hdfs.keytab
>> 13/09/10 09:44:25 INFO mortbay.log: Logging to
>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>> org.mortbay.log.Slf4jLog
>> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
>> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter
>> to getimage
>> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
>> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
>> the listener on 50090
>> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
>> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
>> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
>> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
>> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
>> hpctest3.realm.com@REALM.COM
>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Initialized,
>> principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
>> [/etc/hadoop/hdfs.keytab]
>>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
>> configuration not set, using a random value as secret
>> 13/09/10 09:44:26 INFO mortbay.log: Started
>> SelectChannelConnector@0.0.0.0:50090
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server
>> up at: 0.0.0.0:50090
>>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
>> :3600 secs (60 min)
>> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>>  :67108864 bytes (65536 KB)
>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>> http://hpctest3.realm.com:50070/getimage?getimage=1
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file
>> fsimage size 110 bytes.
>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>> http://hpctest3.realm.com:50070/getimage?getedit=1
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
>> size 40 bytes.
>> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
>> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
>> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
>> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
>> hpctest3.realm.com@REALM.COM
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
>> dfs.block.invalidate.limit=100
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
>> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
>> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring
>> more than 10 times
>> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
>> 13/09/10 09:44:26 INFO common.Storage: Number of files under construction
>> = 0
>> 13/09/10 09:44:26 INFO common.Storage: Edits file
>> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
>> edits # 2 loaded in 0 seconds.
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
>> Total time for transactions(ms): 0Number of transactions batched in Syncs:
>> 0 Number of syncs: 0 SyncTimes(ms): 0
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=40,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 40,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
>> seconds.
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
>> hpctest3.realm.com:50070
>> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint: Exception
>> trying to open authenticated connection to
>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>>
>> The name node log file has this in it:
>>
>> 2013-09-10 09:44:26,219 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
>> removed>
>> 2013-09-10 09:44:26,220 WARN
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
>> edits.new files already exists in all healthy directories:
>>   /tmp/hadoop/hdfs/name/current/edits.new
>> 2013-09-10 09:44:26,327 INFO
>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>> 2013-09-10 09:44:26,362 INFO
>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>>
>> I've tried googling this and looking in a few books, but I really don't
>> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
>> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>>
>> Thanks,
>>
>>    Chris
>>
>
>

Re: Securing the Secondary Name Node

Posted by Benoy Antony <ba...@gmail.com>.
Chris ,

I think that the error occurs when NN tries to download the fsimage from
SNN.
You can check the NN logs to make sure whether this is true.

There could be different reasons for this.

1. NN fails to do SPNEGO with SNN.
2. NN's TGT expired. Unlikely in your test cluster.

Please post with any additional log info and I can help.

benoy




On Thu, Sep 12, 2013 at 6:02 AM, Christopher Penney <cp...@gmail.com>wrote:

> Does anyone have any suggestions or resources I might look at to resolve
> this?  The documentation on setting up Kerberos seems pretty light.
>
>    Chris
>
>
>
> On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:
>
>>
>> Hi,
>>
>> After hosting an insecure Hadoop environment for early testing I'm
>> transitioning to something more secure that would (hopefully) more or less
>> mirror what a production environment might look like.  I've integrated our
>> Hadoop cluster into our Kerberos realm and everything is working ok except
>> for our secondary name node.  When I invoke the secondarynamenode with
>> "-checkpoint force" (when no other secondary name node process is running)
>> I get:
>>
>> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful
>> for user hdfs/hpctest3.realm.com@REALM.COM using keytab file
>> /etc/hadoop/hdfs.keytab
>> 13/09/10 09:44:25 INFO mortbay.log: Logging to
>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>> org.mortbay.log.Slf4jLog
>> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
>> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter
>> to getimage
>> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
>> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
>> the listener on 50090
>> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
>> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
>> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
>> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
>> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
>> hpctest3.realm.com@REALM.COM
>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Initialized,
>> principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
>> [/etc/hadoop/hdfs.keytab]
>>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
>> configuration not set, using a random value as secret
>> 13/09/10 09:44:26 INFO mortbay.log: Started
>> SelectChannelConnector@0.0.0.0:50090
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server
>> up at: 0.0.0.0:50090
>>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
>> :3600 secs (60 min)
>> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>>  :67108864 bytes (65536 KB)
>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>> http://hpctest3.realm.com:50070/getimage?getimage=1
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file
>> fsimage size 110 bytes.
>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>> http://hpctest3.realm.com:50070/getimage?getedit=1
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
>> size 40 bytes.
>> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
>> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
>> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
>> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
>> hpctest3.realm.com@REALM.COM
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
>> dfs.block.invalidate.limit=100
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
>> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
>> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring
>> more than 10 times
>> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
>> 13/09/10 09:44:26 INFO common.Storage: Number of files under construction
>> = 0
>> 13/09/10 09:44:26 INFO common.Storage: Edits file
>> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
>> edits # 2 loaded in 0 seconds.
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
>> Total time for transactions(ms): 0Number of transactions batched in Syncs:
>> 0 Number of syncs: 0 SyncTimes(ms): 0
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=40,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 40,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
>> seconds.
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
>> hpctest3.realm.com:50070
>> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint: Exception
>> trying to open authenticated connection to
>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>>
>> The name node log file has this in it:
>>
>> 2013-09-10 09:44:26,219 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
>> removed>
>> 2013-09-10 09:44:26,220 WARN
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
>> edits.new files already exists in all healthy directories:
>>   /tmp/hadoop/hdfs/name/current/edits.new
>> 2013-09-10 09:44:26,327 INFO
>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>> 2013-09-10 09:44:26,362 INFO
>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>>
>> I've tried googling this and looking in a few books, but I really don't
>> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
>> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>>
>> Thanks,
>>
>>    Chris
>>
>
>

Re: Securing the Secondary Name Node

Posted by Benoy Antony <ba...@gmail.com>.
Chris ,

I think that the error occurs when NN tries to download the fsimage from
SNN.
You can check the NN logs to make sure whether this is true.

There could be different reasons for this.

1. NN fails to do SPNEGO with SNN.
2. NN's TGT expired. Unlikely in your test cluster.

Please post with any additional log info and I can help.

benoy




On Thu, Sep 12, 2013 at 6:02 AM, Christopher Penney <cp...@gmail.com>wrote:

> Does anyone have any suggestions or resources I might look at to resolve
> this?  The documentation on setting up Kerberos seems pretty light.
>
>    Chris
>
>
>
> On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:
>
>>
>> Hi,
>>
>> After hosting an insecure Hadoop environment for early testing I'm
>> transitioning to something more secure that would (hopefully) more or less
>> mirror what a production environment might look like.  I've integrated our
>> Hadoop cluster into our Kerberos realm and everything is working ok except
>> for our secondary name node.  When I invoke the secondarynamenode with
>> "-checkpoint force" (when no other secondary name node process is running)
>> I get:
>>
>> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful
>> for user hdfs/hpctest3.realm.com@REALM.COM using keytab file
>> /etc/hadoop/hdfs.keytab
>> 13/09/10 09:44:25 INFO mortbay.log: Logging to
>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>> org.mortbay.log.Slf4jLog
>> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
>> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
>> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter
>> to getimage
>> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
>> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
>> the listener on 50090
>> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
>> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
>> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
>> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
>> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
>> hpctest3.realm.com@REALM.COM
>> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Initialized,
>> principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
>> [/etc/hadoop/hdfs.keytab]
>>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
>> configuration not set, using a random value as secret
>> 13/09/10 09:44:26 INFO mortbay.log: Started
>> SelectChannelConnector@0.0.0.0:50090
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server
>> up at: 0.0.0.0:50090
>>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
>> :3600 secs (60 min)
>> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>>  :67108864 bytes (65536 KB)
>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>> http://hpctest3.realm.com:50070/getimage?getimage=1
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file
>> fsimage size 110 bytes.
>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>> http://hpctest3.realm.com:50070/getimage?getedit=1
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
>> size 40 bytes.
>> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
>> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
>> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
>> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
>> hpctest3.realm.com@REALM.COM
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
>> dfs.block.invalidate.limit=100
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
>> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
>> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring
>> more than 10 times
>> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
>> 13/09/10 09:44:26 INFO common.Storage: Number of files under construction
>> = 0
>> 13/09/10 09:44:26 INFO common.Storage: Edits file
>> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
>> edits # 2 loaded in 0 seconds.
>> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
>> Total time for transactions(ms): 0Number of transactions batched in Syncs:
>> 0 Number of syncs: 0 SyncTimes(ms): 0
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=40,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 40,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
>> seconds.
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
>> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
>> hpctest3.realm.com:50070
>> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint: Exception
>> trying to open authenticated connection to
>> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
>> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>>
>> The name node log file has this in it:
>>
>> 2013-09-10 09:44:26,219 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
>> removed>
>> 2013-09-10 09:44:26,220 WARN
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
>> edits.new files already exists in all healthy directories:
>>   /tmp/hadoop/hdfs/name/current/edits.new
>> 2013-09-10 09:44:26,327 INFO
>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>> 2013-09-10 09:44:26,362 INFO
>> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
>> allowing: hdfs/hpctest3.realm.com@REALM.COM
>>
>> I've tried googling this and looking in a few books, but I really don't
>> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
>> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>>
>> Thanks,
>>
>>    Chris
>>
>
>

Re: Securing the Secondary Name Node

Posted by Christopher Penney <cp...@gmail.com>.
Does anyone have any suggestions or resources I might look at to resolve
this?  The documentation on setting up Kerberos seems pretty light.

   Chris


On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:

>
> Hi,
>
> After hosting an insecure Hadoop environment for early testing I'm
> transitioning to something more secure that would (hopefully) more or less
> mirror what a production environment might look like.  I've integrated our
> Hadoop cluster into our Kerberos realm and everything is working ok except
> for our secondary name node.  When I invoke the secondarynamenode with
> "-checkpoint force" (when no other secondary name node process is running)
> I get:
>
> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful for
> user hdfs/hpctest3.realm.com@REALM.COM using keytab file
> /etc/hadoop/hdfs.keytab
> 13/09/10 09:44:25 INFO mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter to
> getimage
> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
> the listener on 50090
> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Initialized,
> principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
> [/etc/hadoop/hdfs.keytab]
>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
> configuration not set, using a random value as secret
> 13/09/10 09:44:26 INFO mortbay.log: Started
> SelectChannelConnector@0.0.0.0:50090
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server up
> at: 0.0.0.0:50090
>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
> :3600 secs (60 min)
> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>  :67108864 bytes (65536 KB)
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getimage=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file fsimage
> size 110 bytes.
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getedit=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
> size 40 bytes.
> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
> dfs.block.invalidate.limit=100
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring more
> than 10 times
> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
> 13/09/10 09:44:26 INFO common.Storage: Number of files under construction
> = 0
> 13/09/10 09:44:26 INFO common.Storage: Edits file
> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
> edits # 2 loaded in 0 seconds.
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
> Total time for transactions(ms): 0Number of transactions batched in Syncs:
> 0 Number of syncs: 0 SyncTimes(ms): 0
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
> seconds.
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
> hpctest3.realm.com:50070
> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint: Exception
> trying to open authenticated connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>
> The name node log file has this in it:
>
> 2013-09-10 09:44:26,219 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
> removed>
> 2013-09-10 09:44:26,220 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
> edits.new files already exists in all healthy directories:
>   /tmp/hadoop/hdfs/name/current/edits.new
> 2013-09-10 09:44:26,327 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
> 2013-09-10 09:44:26,362 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
>
> I've tried googling this and looking in a few books, but I really don't
> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>
> Thanks,
>
>    Chris
>

Re: Securing the Secondary Name Node

Posted by Christopher Penney <cp...@gmail.com>.
Does anyone have any suggestions or resources I might look at to resolve
this?  The documentation on setting up Kerberos seems pretty light.

   Chris


On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:

>
> Hi,
>
> After hosting an insecure Hadoop environment for early testing I'm
> transitioning to something more secure that would (hopefully) more or less
> mirror what a production environment might look like.  I've integrated our
> Hadoop cluster into our Kerberos realm and everything is working ok except
> for our secondary name node.  When I invoke the secondarynamenode with
> "-checkpoint force" (when no other secondary name node process is running)
> I get:
>
> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful for
> user hdfs/hpctest3.realm.com@REALM.COM using keytab file
> /etc/hadoop/hdfs.keytab
> 13/09/10 09:44:25 INFO mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter to
> getimage
> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
> the listener on 50090
> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Initialized,
> principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
> [/etc/hadoop/hdfs.keytab]
>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
> configuration not set, using a random value as secret
> 13/09/10 09:44:26 INFO mortbay.log: Started
> SelectChannelConnector@0.0.0.0:50090
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server up
> at: 0.0.0.0:50090
>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
> :3600 secs (60 min)
> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>  :67108864 bytes (65536 KB)
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getimage=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file fsimage
> size 110 bytes.
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getedit=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
> size 40 bytes.
> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
> dfs.block.invalidate.limit=100
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring more
> than 10 times
> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
> 13/09/10 09:44:26 INFO common.Storage: Number of files under construction
> = 0
> 13/09/10 09:44:26 INFO common.Storage: Edits file
> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
> edits # 2 loaded in 0 seconds.
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
> Total time for transactions(ms): 0Number of transactions batched in Syncs:
> 0 Number of syncs: 0 SyncTimes(ms): 0
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
> seconds.
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
> hpctest3.realm.com:50070
> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint: Exception
> trying to open authenticated connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>
> The name node log file has this in it:
>
> 2013-09-10 09:44:26,219 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
> removed>
> 2013-09-10 09:44:26,220 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
> edits.new files already exists in all healthy directories:
>   /tmp/hadoop/hdfs/name/current/edits.new
> 2013-09-10 09:44:26,327 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
> 2013-09-10 09:44:26,362 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
>
> I've tried googling this and looking in a few books, but I really don't
> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>
> Thanks,
>
>    Chris
>

Re: Securing the Secondary Name Node

Posted by Christopher Penney <cp...@gmail.com>.
Following up to my own post I tried upgrading to 1.2.1 and the problem
seemed to go away.  Now the 2NN is working fine.  I'm now seeing:

2013-09-20 11:13:31,318 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
hpctest3.realm.com:50070putimage=1&port=50090&machine=hpctest3.realm.com
&token=-41:832852997:1379689519375:1379690010000:1379689644259&newChecksum=9c94140e6eb8c33a66a14f2e73206d29
2013-09-20 11:13:31,318 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=hpctest3.realm.com&token=-41:832852997:1379689519375:1379690010000:1379689644259&newChecksum=9c94140e6eb8c33a66a14f2e73206d29
2013-09-20 11:13:31,494 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 11:13:31,631 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done.
New Image Size: 30906

   Chris


On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:

>
> Hi,
>
> After hosting an insecure Hadoop environment for early testing I'm
> transitioning to something more secure that would (hopefully) more or less
> mirror what a production environment might look like.  I've integrated our
> Hadoop cluster into our Kerberos realm and everything is working ok except
> for our secondary name node.  When I invoke the secondarynamenode with
> "-checkpoint force" (when no other secondary name node process is running)
> I get:
>
> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful for
> user hdfs/hpctest3.realm.com@REALM.COM using keytab file
> /etc/hadoop/hdfs.keytab
> 13/09/10 09:44:25 INFO mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter to
> getimage
> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
> the listener on 50090
> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Initialized,
> principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
> [/etc/hadoop/hdfs.keytab]
>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
> configuration not set, using a random value as secret
> 13/09/10 09:44:26 INFO mortbay.log: Started
> SelectChannelConnector@0.0.0.0:50090
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server up
> at: 0.0.0.0:50090
>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
> :3600 secs (60 min)
> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>  :67108864 bytes (65536 KB)
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getimage=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file fsimage
> size 110 bytes.
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getedit=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
> size 40 bytes.
> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
> dfs.block.invalidate.limit=100
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring more
> than 10 times
> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
> 13/09/10 09:44:26 INFO common.Storage: Number of files under construction
> = 0
> 13/09/10 09:44:26 INFO common.Storage: Edits file
> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
> edits # 2 loaded in 0 seconds.
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
> Total time for transactions(ms): 0Number of transactions batched in Syncs:
> 0 Number of syncs: 0 SyncTimes(ms): 0
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
> seconds.
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
> hpctest3.realm.com:50070
> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint: Exception
> trying to open authenticated connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>
> The name node log file has this in it:
>
> 2013-09-10 09:44:26,219 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
> removed>
> 2013-09-10 09:44:26,220 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
> edits.new files already exists in all healthy directories:
>   /tmp/hadoop/hdfs/name/current/edits.new
> 2013-09-10 09:44:26,327 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
> 2013-09-10 09:44:26,362 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
>
> I've tried googling this and looking in a few books, but I really don't
> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>
> Thanks,
>
>    Chris
>

Re: Securing the Secondary Name Node

Posted by Christopher Penney <cp...@gmail.com>.
Following up to my own post I tried upgrading to 1.2.1 and the problem
seemed to go away.  Now the 2NN is working fine.  I'm now seeing:

2013-09-20 11:13:31,318 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
hpctest3.realm.com:50070putimage=1&port=50090&machine=hpctest3.realm.com
&token=-41:832852997:1379689519375:1379690010000:1379689644259&newChecksum=9c94140e6eb8c33a66a14f2e73206d29
2013-09-20 11:13:31,318 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=hpctest3.realm.com&token=-41:832852997:1379689519375:1379690010000:1379689644259&newChecksum=9c94140e6eb8c33a66a14f2e73206d29
2013-09-20 11:13:31,494 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 11:13:31,631 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done.
New Image Size: 30906

   Chris


On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:

>
> Hi,
>
> After hosting an insecure Hadoop environment for early testing I'm
> transitioning to something more secure that would (hopefully) more or less
> mirror what a production environment might look like.  I've integrated our
> Hadoop cluster into our Kerberos realm and everything is working ok except
> for our secondary name node.  When I invoke the secondarynamenode with
> "-checkpoint force" (when no other secondary name node process is running)
> I get:
>
> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful for
> user hdfs/hpctest3.realm.com@REALM.COM using keytab file
> /etc/hadoop/hdfs.keytab
> 13/09/10 09:44:25 INFO mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter to
> getimage
> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
> the listener on 50090
> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Initialized,
> principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
> [/etc/hadoop/hdfs.keytab]
>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
> configuration not set, using a random value as secret
> 13/09/10 09:44:26 INFO mortbay.log: Started
> SelectChannelConnector@0.0.0.0:50090
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server up
> at: 0.0.0.0:50090
>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
> :3600 secs (60 min)
> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>  :67108864 bytes (65536 KB)
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getimage=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file fsimage
> size 110 bytes.
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getedit=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
> size 40 bytes.
> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
> dfs.block.invalidate.limit=100
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring more
> than 10 times
> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
> 13/09/10 09:44:26 INFO common.Storage: Number of files under construction
> = 0
> 13/09/10 09:44:26 INFO common.Storage: Edits file
> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
> edits # 2 loaded in 0 seconds.
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
> Total time for transactions(ms): 0Number of transactions batched in Syncs:
> 0 Number of syncs: 0 SyncTimes(ms): 0
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
> seconds.
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
> hpctest3.realm.com:50070
> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint: Exception
> trying to open authenticated connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>
> The name node log file has this in it:
>
> 2013-09-10 09:44:26,219 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
> removed>
> 2013-09-10 09:44:26,220 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
> edits.new files already exists in all healthy directories:
>   /tmp/hadoop/hdfs/name/current/edits.new
> 2013-09-10 09:44:26,327 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
> 2013-09-10 09:44:26,362 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
>
> I've tried googling this and looking in a few books, but I really don't
> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>
> Thanks,
>
>    Chris
>

Re: Securing the Secondary Name Node

Posted by Christopher Penney <cp...@gmail.com>.
Following up to my own post I tried upgrading to 1.2.1 and the problem
seemed to go away.  Now the 2NN is working fine.  I'm now seeing:

2013-09-20 11:13:31,318 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
hpctest3.realm.com:50070putimage=1&port=50090&machine=hpctest3.realm.com
&token=-41:832852997:1379689519375:1379690010000:1379689644259&newChecksum=9c94140e6eb8c33a66a14f2e73206d29
2013-09-20 11:13:31,318 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=hpctest3.realm.com&token=-41:832852997:1379689519375:1379690010000:1379689644259&newChecksum=9c94140e6eb8c33a66a14f2e73206d29
2013-09-20 11:13:31,494 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 11:13:31,631 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done.
New Image Size: 30906

   Chris


On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:

>
> Hi,
>
> After hosting an insecure Hadoop environment for early testing I'm
> transitioning to something more secure that would (hopefully) more or less
> mirror what a production environment might look like.  I've integrated our
> Hadoop cluster into our Kerberos realm and everything is working ok except
> for our secondary name node.  When I invoke the secondarynamenode with
> "-checkpoint force" (when no other secondary name node process is running)
> I get:
>
> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful for
> user hdfs/hpctest3.realm.com@REALM.COM using keytab file
> /etc/hadoop/hdfs.keytab
> 13/09/10 09:44:25 INFO mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter to
> getimage
> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
> the listener on 50090
> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Initialized,
> principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
> [/etc/hadoop/hdfs.keytab]
>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
> configuration not set, using a random value as secret
> 13/09/10 09:44:26 INFO mortbay.log: Started
> SelectChannelConnector@0.0.0.0:50090
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server up
> at: 0.0.0.0:50090
>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
> :3600 secs (60 min)
> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>  :67108864 bytes (65536 KB)
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getimage=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file fsimage
> size 110 bytes.
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getedit=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
> size 40 bytes.
> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
> dfs.block.invalidate.limit=100
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring more
> than 10 times
> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
> 13/09/10 09:44:26 INFO common.Storage: Number of files under construction
> = 0
> 13/09/10 09:44:26 INFO common.Storage: Edits file
> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
> edits # 2 loaded in 0 seconds.
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
> Total time for transactions(ms): 0Number of transactions batched in Syncs:
> 0 Number of syncs: 0 SyncTimes(ms): 0
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
> seconds.
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
> hpctest3.realm.com:50070
> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint: Exception
> trying to open authenticated connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>
> The name node log file has this in it:
>
> 2013-09-10 09:44:26,219 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
> removed>
> 2013-09-10 09:44:26,220 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
> edits.new files already exists in all healthy directories:
>   /tmp/hadoop/hdfs/name/current/edits.new
> 2013-09-10 09:44:26,327 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
> 2013-09-10 09:44:26,362 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
>
> I've tried googling this and looking in a few books, but I really don't
> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>
> Thanks,
>
>    Chris
>

Re: Securing the Secondary Name Node

Posted by Christopher Penney <cp...@gmail.com>.
Does anyone have any suggestions or resources I might look at to resolve
this?  The documentation on setting up Kerberos seems pretty light.

   Chris


On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:

>
> Hi,
>
> After hosting an insecure Hadoop environment for early testing I'm
> transitioning to something more secure that would (hopefully) more or less
> mirror what a production environment might look like.  I've integrated our
> Hadoop cluster into our Kerberos realm and everything is working ok except
> for our secondary name node.  When I invoke the secondarynamenode with
> "-checkpoint force" (when no other secondary name node process is running)
> I get:
>
> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful for
> user hdfs/hpctest3.realm.com@REALM.COM using keytab file
> /etc/hadoop/hdfs.keytab
> 13/09/10 09:44:25 INFO mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter to
> getimage
> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
> the listener on 50090
> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Initialized,
> principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
> [/etc/hadoop/hdfs.keytab]
>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
> configuration not set, using a random value as secret
> 13/09/10 09:44:26 INFO mortbay.log: Started
> SelectChannelConnector@0.0.0.0:50090
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server up
> at: 0.0.0.0:50090
>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
> :3600 secs (60 min)
> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>  :67108864 bytes (65536 KB)
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getimage=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file fsimage
> size 110 bytes.
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getedit=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
> size 40 bytes.
> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
> dfs.block.invalidate.limit=100
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring more
> than 10 times
> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
> 13/09/10 09:44:26 INFO common.Storage: Number of files under construction
> = 0
> 13/09/10 09:44:26 INFO common.Storage: Edits file
> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
> edits # 2 loaded in 0 seconds.
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
> Total time for transactions(ms): 0Number of transactions batched in Syncs:
> 0 Number of syncs: 0 SyncTimes(ms): 0
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
> seconds.
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
> hpctest3.realm.com:50070
> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint: Exception
> trying to open authenticated connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>
> The name node log file has this in it:
>
> 2013-09-10 09:44:26,219 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
> removed>
> 2013-09-10 09:44:26,220 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
> edits.new files already exists in all healthy directories:
>   /tmp/hadoop/hdfs/name/current/edits.new
> 2013-09-10 09:44:26,327 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
> 2013-09-10 09:44:26,362 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
>
> I've tried googling this and looking in a few books, but I really don't
> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>
> Thanks,
>
>    Chris
>

Re: Securing the Secondary Name Node

Posted by Christopher Penney <cp...@gmail.com>.
Following up to my own post I tried upgrading to 1.2.1 and the problem
seemed to go away.  Now the 2NN is working fine.  I'm now seeing:

2013-09-20 11:13:31,318 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
hpctest3.realm.com:50070putimage=1&port=50090&machine=hpctest3.realm.com
&token=-41:832852997:1379689519375:1379690010000:1379689644259&newChecksum=9c94140e6eb8c33a66a14f2e73206d29
2013-09-20 11:13:31,318 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to
http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=hpctest3.realm.com&token=-41:832852997:1379689519375:1379690010000:1379689644259&newChecksum=9c94140e6eb8c33a66a14f2e73206d29
2013-09-20 11:13:31,494 INFO
org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
allowing: hdfs/hpctest3.realm.com@REALM.COM
2013-09-20 11:13:31,631 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done.
New Image Size: 30906

   Chris


On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:

>
> Hi,
>
> After hosting an insecure Hadoop environment for early testing I'm
> transitioning to something more secure that would (hopefully) more or less
> mirror what a production environment might look like.  I've integrated our
> Hadoop cluster into our Kerberos realm and everything is working ok except
> for our secondary name node.  When I invoke the secondarynamenode with
> "-checkpoint force" (when no other secondary name node process is running)
> I get:
>
> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful for
> user hdfs/hpctest3.realm.com@REALM.COM using keytab file
> /etc/hadoop/hdfs.keytab
> 13/09/10 09:44:25 INFO mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter to
> getimage
> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
> the listener on 50090
> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Initialized,
> principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
> [/etc/hadoop/hdfs.keytab]
>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
> configuration not set, using a random value as secret
> 13/09/10 09:44:26 INFO mortbay.log: Started
> SelectChannelConnector@0.0.0.0:50090
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server up
> at: 0.0.0.0:50090
>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
> :3600 secs (60 min)
> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>  :67108864 bytes (65536 KB)
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getimage=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file fsimage
> size 110 bytes.
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getedit=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
> size 40 bytes.
> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
> dfs.block.invalidate.limit=100
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring more
> than 10 times
> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
> 13/09/10 09:44:26 INFO common.Storage: Number of files under construction
> = 0
> 13/09/10 09:44:26 INFO common.Storage: Edits file
> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
> edits # 2 loaded in 0 seconds.
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
> Total time for transactions(ms): 0Number of transactions batched in Syncs:
> 0 Number of syncs: 0 SyncTimes(ms): 0
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
> seconds.
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
> hpctest3.realm.com:50070
> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint: Exception
> trying to open authenticated connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>
> The name node log file has this in it:
>
> 2013-09-10 09:44:26,219 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
> removed>
> 2013-09-10 09:44:26,220 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
> edits.new files already exists in all healthy directories:
>   /tmp/hadoop/hdfs/name/current/edits.new
> 2013-09-10 09:44:26,327 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
> 2013-09-10 09:44:26,362 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
>
> I've tried googling this and looking in a few books, but I really don't
> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>
> Thanks,
>
>    Chris
>

Re: Securing the Secondary Name Node

Posted by Christopher Penney <cp...@gmail.com>.
Does anyone have any suggestions or resources I might look at to resolve
this?  The documentation on setting up Kerberos seems pretty light.

   Chris


On Tue, Sep 10, 2013 at 9:55 AM, Christopher Penney <cp...@gmail.com>wrote:

>
> Hi,
>
> After hosting an insecure Hadoop environment for early testing I'm
> transitioning to something more secure that would (hopefully) more or less
> mirror what a production environment might look like.  I've integrated our
> Hadoop cluster into our Kerberos realm and everything is working ok except
> for our secondary name node.  When I invoke the secondarynamenode with
> "-checkpoint force" (when no other secondary name node process is running)
> I get:
>
> 13/09/10 09:44:25 INFO security.UserGroupInformation: Login successful for
> user hdfs/hpctest3.realm.com@REALM.COM using keytab file
> /etc/hadoop/hdfs.keytab
> 13/09/10 09:44:25 INFO mortbay.log: Logging to
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> org.mortbay.log.Slf4jLog
> 13/09/10 09:44:25 INFO http.HttpServer: Added global filtersafety
> (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
> 13/09/10 09:44:25 INFO http.HttpServer: Adding Kerberos (SPNEGO) filter to
> getimage
> 13/09/10 09:44:25 INFO http.HttpServer: Port returned by
> webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
> the listener on 50090
> 13/09/10 09:44:25 INFO http.HttpServer: listener.getLocalPort() returned
> 50090 webServer.getConnectors()[0].getLocalPort() returned 50090
> 13/09/10 09:44:25 INFO http.HttpServer: Jetty bound to port 50090
> 13/09/10 09:44:25 INFO mortbay.log: jetty-6.1.26
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Login using
> keytab /etc/hadoop/hdfs.keytab, for principal HTTP/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO server.KerberosAuthenticationHandler: Initialized,
> principal [HTTP/hpctest3.realm.com@REALM.COM] from keytab
> [/etc/hadoop/hdfs.keytab]
>  13/09/10 09:44:26 WARN server.AuthenticationFilter: 'signature.secret'
> configuration not set, using a random value as secret
> 13/09/10 09:44:26 INFO mortbay.log: Started
> SelectChannelConnector@0.0.0.0:50090
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Web server init done
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Secondary Web-server up
> at: 0.0.0.0:50090
>  13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Checkpoint Period
> :3600 secs (60 min)
> 13/09/10 09:44:26 WARN namenode.SecondaryNameNode: Log Size Trigger
>  :67108864 bytes (65536 KB)
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getimage=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file fsimage
> size 110 bytes.
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?getedit=1
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Downloaded file edits
> size 40 bytes.
> 13/09/10 09:44:26 INFO util.GSet: VM type       = 64-bit
> 13/09/10 09:44:26 INFO util.GSet: 2% max memory = 35.55625 MB
> 13/09/10 09:44:26 INFO util.GSet: capacity      = 2^22 = 4194304 entries
> 13/09/10 09:44:26 INFO util.GSet: recommended=4194304, actual=4194304
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: fsOwner=hdfs/
> hpctest3.realm.com@REALM.COM
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: supergroup=supergroup
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isPermissionEnabled=true
> 13/09/10 09:44:26 INFO namenode.FSNamesystem:
> dfs.block.invalidate.limit=100
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: isAccessTokenEnabled=true
> accessKeyUpdateInterval=600 min(s), accessTokenLifetime=600 min(s)
> 13/09/10 09:44:26 INFO namenode.NameNode: Caching file names occuring more
> than 10 times
> 13/09/10 09:44:26 INFO common.Storage: Number of files = 1
> 13/09/10 09:44:26 INFO common.Storage: Number of files under construction
> = 0
> 13/09/10 09:44:26 INFO common.Storage: Edits file
> /tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits of size 40
> edits # 2 loaded in 0 seconds.
> 13/09/10 09:44:26 INFO namenode.FSNamesystem: Number of transactions: 0
> Total time for transactions(ms): 0Number of transactions batched in Syncs:
> 0 Number of syncs: 0 SyncTimes(ms): 0
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 40,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO common.Storage: Image file of size 144 saved in 0
> seconds.
> 13/09/10 09:44:26 INFO namenode.FSEditLog: closing edit log: position=4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.FSEditLog: close success: truncate to 4,
> editlog=/tmp/hadoop/tmp/hadoop-root/dfs/namesecondary/current/edits
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: Posted URL
> hpctest3.realm.com:50070
> putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.TransferFsImage: Opening connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 ERROR namenode.SecondaryNameNode: checkpoint: Exception
> trying to open authenticated connection to
> http://hpctest3.realm.com:50070/getimage?putimage=1&port=50090&machine=0.0.0.0&token=-32:67504834:0:1378758765000:1378758462746&newChecksum=f6fbd8485835f40b5950ce36c4877504
> 13/09/10 09:44:26 INFO namenode.SecondaryNameNode: SHUTDOWN_MSG:
>
> The name node log file has this in it:
>
> 2013-09-10 09:44:26,219 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from <ip
> removed>
> 2013-09-10 09:44:26,220 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Cannot roll edit log,
> edits.new files already exists in all healthy directories:
>   /tmp/hadoop/hdfs/name/current/edits.new
> 2013-09-10 09:44:26,327 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
> 2013-09-10 09:44:26,362 INFO
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet: GetImageServlet
> allowing: hdfs/hpctest3.realm.com@REALM.COM
>
> I've tried googling this and looking in a few books, but I really don't
> find a lot of good advice.  I'm using hadoop 1.1.1.  I'm haoopy to post
> more info, but I didn't want to dump in a bunch of unnecessary stuff.
>
> Thanks,
>
>    Chris
>