You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Tyler Coffin <tc...@rim.com> on 2011/01/05 17:32:55 UTC

Secondary Namenode doCheckpoint, FileNotFoundException

I'm getting the below exception on my secondary namenode.
As far as I can tell the edits isn't being reconciled as it should be (i.e. edits.NEW continues to grow) on the namenode.

I've searched around but haven't turned up any answers so far. Anyone have any ideas?

Hadoop 0.20.2
NN host below is hadoop08
2NN is hadoop07

dfs.http.address is set to hadoop08:50070

2011-01-05 11:20:47,603 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 
2011-01-05 11:20:47,605 INFO org.apache.hadoop.hdfs.server.common.Storage: Completing previous checkpoint for storage directory /tmp/hadoop-admin/dfs/namesecondary.
2011-01-05 11:20:47,741 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file fsimage size 13174593 bytes.
2011-01-05 11:20:47,744 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file edits size 86299 bytes.
2011-01-05 11:20:47,768 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=admin,users,admin
2011-01-05 11:20:47,768 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-01-05 11:20:47,768 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true
2011-01-05 11:20:47,853 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 98313
2011-01-05 11:20:48,534 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 0
2011-01-05 11:20:48,550 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /tmp/hadoop-admin/dfs/namesecondary/current/edits of size 86299 edits # 851 loaded in 0 seconds.
2011-01-05 11:20:48,718 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 13200553 saved in 0 seconds.
2011-01-05 11:20:48,720 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 
2011-01-05 11:20:48,742 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL hadoop08:50070putimage=1&port=50090&machine=10.90.37.13&token=-18:130391589:0:1294242715000:1294242381014
2011-01-05 11:20:48,749 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint: 
2011-01-05 11:20:48,749 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.FileNotFoundException: http://hadoop08:50070/getimage?putimage=1&port=50090&machine=10.1.1.5&token=-18:130391589:0:1294242715000:1294242381014
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1311)
        at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:151)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.putFSImage(SecondaryNameNode.java:280)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:320)
        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
        at java.lang.Thread.run(Thread.java:619)

---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

RE: Secondary Namenode doCheckpoint, FileNotFoundException

Posted by Tyler Coffin <tc...@rim.com>.
Now that you mentioned it I went back and noticed a warning message on 2NN start-up that webapps/secondary was missing.

The reason it was missing is that I use a git repository for managing our Hadoop deployments (branches for each different hardware config).
But webapps/secondary/WEB-INF is an empty directory so it wasn't handled properly. I added a .keep and all is well now.

Thanks again!


From: suresh srinivas [mailto:srini30005@gmail.com]
Sent: January 6, 2011 18:30
To: hdfs-user@hadoop.apache.org
Subject: Re: Secondary Namenode doCheckpoint, FileNotFoundException

I suspect namenode cannot access secondary namenode URLs  http://10.90.37.13:50090/getimage?getimage=1.
If you can have log from secondary namenode problem along with the corresponding namenode log, we can confirm that.

---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

Re: Secondary Namenode doCheckpoint, FileNotFoundException

Posted by suresh srinivas <sr...@gmail.com>.
I suspect namenode cannot access secondary namenode URLs
http://10.90.37.13:50090/getimage?getimage=1.
If you can have log from secondary namenode problem along with the
corresponding namenode log, we can confirm that.

Re: Secondary Namenode doCheckpoint, FileNotFoundException

Posted by Jakob Homan <jg...@gmail.com>.
Is the 2NN reachable at http://10.1.1.5:50090? This is the addr the NN
is being told to grab the merged image from.  There can be problems
with VIPs, etc. if this address is not reachable.

On Thu, Jan 6, 2011 at 12:57 PM, Tyler Coffin <tc...@rim.com> wrote:
>
>
> 2011-01-06 15:52:00,814 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=admin,users,admin   ip=/10.90.37.10 cmd=open
> src=/logs/remote.log.01.20110105.2045.lzo       dst=null        perm=null
>
> 2011-01-06 15:52:00,881 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=admin,users,admin   ip=/10.90.37.12 cmd=open
> src=/logs/remote.log.01.20110104.1815.lzo       dst=null        perm=null
>
> 2011-01-06 15:52:00,891 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=admin,users,admin   ip=/10.90.37.9  cmd=open
> src=/logs/remote.log.01.20110105.0015.lzo       dst=null        perm=null
>
> 2011-01-06 15:52:01,253 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=admin,users,admin   ip=/10.90.37.7  cmd=open
> src=/logs/remote.log.01.20110105.2035.lzo       dst=null        perm=null
>
> 2011-01-06 15:52:01,612 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=admin,users,admin   ip=/10.90.37.11 cmd=open
> src=/logs/remote.log.01.20110105.2145.lzo       dst=null        perm=null
>
> 2011-01-06 15:52:01,715 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=admin,users,admin   ip=/10.90.37.8  cmd=open
> src=/logs/remote.log.01.20110104.1915.lzo       dst=null        perm=null
>
> 2011-01-06 15:52:02,030 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=admin,users,admin   ip=/10.90.37.13 cmd=open
> src=/logs/remote.log.01.20110105.1235.lzo       dst=null        perm=null
>
> 2011-01-06 15:52:02,701 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
> 10.11.138.52:50010 to delete  blk_5975970115065306335_564427
> blk_-4283643305091895835_564439
>
> 2011-01-06 15:52:02,701 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
> 10.90.37.8:50010 to delete  blk_7719330829554745618_564439
>
> 2011-01-06 15:52:02,701 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
> 10.11.138.73:50010 to delete  blk_5975970115065306335_564427
>
> 2011-01-06 15:52:02,701 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask
> 10.11.138.156:50010 to delete  blk_-4283643305091895835_564439
> blk_7719330829554745618_564439
>
> 2011-01-06 15:52:04,256 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
> 10.90.37.13
>
> 2011-01-06 15:52:04,494 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=admin,users,admin   ip=/10.90.37.12 cmd=open
> src=/logs/remote.log.01.20110105.1120.lzo       dst=null        perm=null
>
> 2011-01-06 15:52:04,529 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=admin,users,admin   ip=/10.90.37.7  cmd=open
> src=/logs/remote.log.01.20110104.1950.lzo       dst=null        perm=null
>
> 2011-01-06 15:52:04,563 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=admin,users,admin   ip=/10.90.37.9  cmd=open
> src=/logs/remote.log.01.20110105.0215.lzo       dst=null        perm=null
>
> 2011-01-06 15:52:05,255 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=admin,users,admin   ip=/10.90.37.11 cmd=open
> src=/logs/remote.log.01.20110105.2245.lzo       dst=null        perm=null
>
> 2011-01-06 15:52:05,261 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=admin,users,admin   ip=/10.90.37.10 cmd=open
> src=/logs/remote.log.01.20110105.0835.lzo       dst=null        perm=null
>
> 2011-01-06 15:52:06,960 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=admin,users,admin   ip=/10.90.37.8  cmd=open
> src=/logs/remote.log.01.20110104.0855.lzo       dst=null        perm=null
>
> 2011-01-06 15:52:07,147 WARN org.mortbay.log: /getimage:
> java.io.IOException: GetImage failed. java.io.IOException: Server returned
> HTTP response code: 503 for URL:
> http://10.90.37.13:50090/getimage?getimage=1
>
>         at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1313)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:151)
>
>         at
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:58)
>
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>
>         at
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
>
>         at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
>
>         at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>
>         at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
>
>         at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>
>        at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
>
>         at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>
>         at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>
>         at org.mortbay.jetty.Server.handle(Server.java:324)
>
>         at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
>
>         at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
>
>         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
>
>         at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
>
>         at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
>
>         at
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
>
>         at
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
>
>
>
> 2011-01-06 15:52:07,522 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=admin,users,admin   ip=/10.90.37.10 cmd=open
> src=/logs/remote.log.01.20110104.2125.lzo       dst=null        perm=null
>
>
>
> The errors from the 2nn repeat frequently, I grabbed a snippet of the nn log
> from the most recent occurange (i.e. it’s not the same occurance as my
> original post)
>
>
>
>
>
>
>
> From: suresh srinivas [mailto:srini30005@gmail.com]
> Sent: January 6, 2011 15:27
> To: hdfs-user@hadoop.apache.org
> Subject: Re: Secondary Namenode doCheckpoint, FileNotFoundException
>
>
>
> Can you add namenode log around this time?
>
> ---------------------------------------------------------------------
> This transmission (including any attachments) may contain confidential
> information, privileged material (including material protected by the
> solicitor-client or other applicable privileges), or constitute non-public
> information. Any use of this information by anyone other than the intended
> recipient is prohibited. If you have received this transmission in error,
> please immediately reply to the sender and delete this information from your
> system. Use, dissemination, distribution, or reproduction of this
> transmission by unintended recipients is not authorized and may be unlawful.

RE: Secondary Namenode doCheckpoint, FileNotFoundException

Posted by Tyler Coffin <tc...@rim.com>.
2011-01-06 15:52:00,814 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=admin,users,admin   ip=/10.90.37.10 cmd=open        src=/logs/remote.log.01.20110105.2045.lzo       dst=null        perm=null
2011-01-06 15:52:00,881 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=admin,users,admin   ip=/10.90.37.12 cmd=open        src=/logs/remote.log.01.20110104.1815.lzo       dst=null        perm=null
2011-01-06 15:52:00,891 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=admin,users,admin   ip=/10.90.37.9  cmd=open        src=/logs/remote.log.01.20110105.0015.lzo       dst=null        perm=null
2011-01-06 15:52:01,253 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=admin,users,admin   ip=/10.90.37.7  cmd=open        src=/logs/remote.log.01.20110105.2035.lzo       dst=null        perm=null
2011-01-06 15:52:01,612 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=admin,users,admin   ip=/10.90.37.11 cmd=open        src=/logs/remote.log.01.20110105.2145.lzo       dst=null        perm=null
2011-01-06 15:52:01,715 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=admin,users,admin   ip=/10.90.37.8  cmd=open        src=/logs/remote.log.01.20110104.1915.lzo       dst=null        perm=null
2011-01-06 15:52:02,030 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=admin,users,admin   ip=/10.90.37.13 cmd=open        src=/logs/remote.log.01.20110105.1235.lzo       dst=null        perm=null
2011-01-06 15:52:02,701 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.11.138.52:50010 to delete  blk_5975970115065306335_564427 blk_-4283643305091895835_564439
2011-01-06 15:52:02,701 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.90.37.8:50010 to delete  blk_7719330829554745618_564439
2011-01-06 15:52:02,701 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.11.138.73:50010 to delete  blk_5975970115065306335_564427
2011-01-06 15:52:02,701 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 10.11.138.156:50010 to delete  blk_-4283643305091895835_564439 blk_7719330829554745618_564439
2011-01-06 15:52:04,256 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 10.90.37.13
2011-01-06 15:52:04,494 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=admin,users,admin   ip=/10.90.37.12 cmd=open        src=/logs/remote.log.01.20110105.1120.lzo       dst=null        perm=null
2011-01-06 15:52:04,529 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=admin,users,admin   ip=/10.90.37.7  cmd=open        src=/logs/remote.log.01.20110104.1950.lzo       dst=null        perm=null
2011-01-06 15:52:04,563 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=admin,users,admin   ip=/10.90.37.9  cmd=open        src=/logs/remote.log.01.20110105.0215.lzo       dst=null        perm=null
2011-01-06 15:52:05,255 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=admin,users,admin   ip=/10.90.37.11 cmd=open        src=/logs/remote.log.01.20110105.2245.lzo       dst=null        perm=null
2011-01-06 15:52:05,261 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=admin,users,admin   ip=/10.90.37.10 cmd=open        src=/logs/remote.log.01.20110105.0835.lzo       dst=null        perm=null
2011-01-06 15:52:06,960 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=admin,users,admin   ip=/10.90.37.8  cmd=open        src=/logs/remote.log.01.20110104.0855.lzo       dst=null        perm=null
2011-01-06 15:52:07,147 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.io.IOException: Server returned HTTP response code: 503 for URL: http://10.90.37.13:50090/getimage?getimage=1
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1313)
        at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:151)
        at org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:58)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
       at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:324)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

2011-01-06 15:52:07,522 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: ugi=admin,users,admin   ip=/10.90.37.10 cmd=open        src=/logs/remote.log.01.20110104.2125.lzo       dst=null        perm=null

The errors from the 2nn repeat frequently, I grabbed a snippet of the nn log from the most recent occurange (i.e. it's not the same occurance as my original post)



From: suresh srinivas [mailto:srini30005@gmail.com]
Sent: January 6, 2011 15:27
To: hdfs-user@hadoop.apache.org
Subject: Re: Secondary Namenode doCheckpoint, FileNotFoundException

Can you add namenode log around this time?

---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential information, privileged material (including material protected by the solicitor-client or other applicable privileges), or constitute non-public information. Any use of this information by anyone other than the intended recipient is prohibited. If you have received this transmission in error, please immediately reply to the sender and delete this information from your system. Use, dissemination, distribution, or reproduction of this transmission by unintended recipients is not authorized and may be unlawful.

Re: Secondary Namenode doCheckpoint, FileNotFoundException

Posted by suresh srinivas <sr...@gmail.com>.
Can you add namenode log around this time?