You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jeremy Hansen <je...@skidrow.la> on 2011/09/07 02:26:15 UTC

IMAGE_AND_EDITS Failed

I happened to notice this today and being fairly new to administering 
hadoop, I'm not exactly sure how to pull out of this situation without 
data loss.

The checkpoint hasn't happened since Sept 2nd.

-rw-r--r-- 1 hdfs hdfs        8889 Sep  2 14:09 edits
-rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
-rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
-rw-r--r-- 1 hdfs hdfs           8 Sep  2 14:09 fstime
-rw-r--r-- 1 hdfs hdfs         100 Sep  2 14:09 VERSION

/mnt/data0/dfs/nn/image
-rw-r--r-- 1 hdfs hdfs    157 Sep  2 14:09 fsimage

I'm also seeing this in the NN logs:

2011-09-06 16:48:23,738 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 10.10.10.11
2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.lang.NullPointerException
         at org.apache.hadoop.hdfs.server.namenode.FSImage.getImageFile(FSImage.java:219)
         at org.apache.hadoop.hdfs.server.namenode.FSImage.getFsImageName(FSImage.java:1584)
         at org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:75)
         at org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:70)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:396)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
         at org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:70)
         at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
         at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
         at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
         at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
         at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:824)
         at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
         at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
         at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
         at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
         at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
         at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
         at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
         at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
         at org.mortbay.jetty.Server.handle(Server.java:326)
         at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
         at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
         at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
         at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)

On the secondary name node:

2011-09-06 16:51:53,538 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.io.FileNotFoundException: http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1
         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
         at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1360)
         at java.security.AccessController.doPrivileged(Native Method)
         at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1354)
         at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1008)
         at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:183)
         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:348)
         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$3.run(SecondaryNameNode.java:337)
         at java.security.AccessController.doPrivileged(Native Method)
         at javax.security.auth.Subject.doAs(Subject.java:396)
         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:337)
         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:422)
         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:313)
         at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:276)
         at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.FileNotFoundException: http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1
         at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1303)
         at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2165)
         at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:175)
         ... 10 more

Any help would be very much appreciated.  I'm scared to shut down the NN.  I've tried restarting the 2NN.

Thank You
-jeremy

Re: IMAGE_AND_EDITS Failed

Posted by Ravi Prakash <ra...@gmail.com>.
Can you hexdump the edits file, write something to HDFS, hexdump again and
then compare the two hexdumps? Are you sure you're looking at the correct
fsedits file? How many storage directories did you have configured?


On Wed, Sep 7, 2011 at 11:57 AM, Jeremy Hansen <je...@skidrow.la> wrote:

> Things still work in hdfs but the edits file is not being updated.
> Timestamp is sept 2nd.
>
> -jeremy
>
> On Sep 7, 2011, at 9:45 AM, Ravi Prakash <ra...@gmail.com> wrote:
>
> > If your HDFS is still working, the fsimage file won't be getting updated
> but
> > the edits file still should. That's why I asked question 2.
> >
> > On Wed, Sep 7, 2011 at 11:39 AM, Jeremy Hansen <je...@skidrow.la>
> wrote:
> >
> >> The problem is that fsimage and edits are no longer being updated, so…if
> I
> >> restart, how could it replay those?
> >>
> >> -jeremy
> >>
> >>
> >> On Sep 7, 2011, at 8:48 AM, Ravi Prakash wrote:
> >>
> >>> Actually I take that back. Restarting the NN might not result in loss
> of
> >>> data. It will probably just take longer to start up because it would
> read
> >>> the fsimage, then apply the fsedits (rather than the SNN doing it).
> >>>
> >>> On Wed, Sep 7, 2011 at 10:46 AM, Ravi Prakash <ra...@gmail.com>
> >> wrote:
> >>>
> >>>> Hi Jeremy,
> >>>>
> >>>> Couple of questions:
> >>>>
> >>>> 1. Which version of Hadoop are you using?
> >>>> 2. If you write something into HDFS, can you subsequently read it?
> >>>> 3. Are you sure your secondarynamenode configuration is correct? It
> >> seems
> >>>> like your SNN is telling your NN to roll the edit log (move the
> >> journaling
> >>>> directory from current to .new), but when it tries to download the
> image
> >>>> file, its not finding it.
> >>>> 3. I wish I could say I haven't ever seen that stack trace in the
> logs.
> >> I
> >>>> was seeing something similar (not the same, quite far from it
> actually)
> >> (
> >>>> https://issues.apache.org/jira/browse/HDFS-2011 ).
> >>>>
> >>>> If I were you, and I felt exceptionally brave (mind you I've worked
> with
> >>>> only test systems, no production sys-admin guts for me ;-) ) I would
> >>>> probably do everything I can, to get the secondarynamenode started
> >> properly
> >>>> and make it checkpoint properly.
> >>>>
> >>>> Me thinks restarting the namenode will most likely result in loss of
> >> data.
> >>>>
> >>>> Hope this helps
> >>>> Ravi.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen <je...@skidrow.la>
> >> wrote:
> >>>>
> >>>>>
> >>>>> I happened to notice this today and being fairly new to administering
> >>>>> hadoop, I'm not exactly sure how to pull out of this situation
> without
> >> data
> >>>>> loss.
> >>>>>
> >>>>> The checkpoint hasn't happened since Sept 2nd.
> >>>>>
> >>>>> -rw-r--r-- 1 hdfs hdfs        8889 Sep  2 14:09 edits
> >>>>> -rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
> >>>>> -rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
> >>>>> -rw-r--r-- 1 hdfs hdfs           8 Sep  2 14:09 fstime
> >>>>> -rw-r--r-- 1 hdfs hdfs         100 Sep  2 14:09 VERSION
> >>>>>
> >>>>> /mnt/data0/dfs/nn/image
> >>>>> -rw-r--r-- 1 hdfs hdfs    157 Sep  2 14:09 fsimage
> >>>>>
> >>>>> I'm also seeing this in the NN logs:
> >>>>>
> >>>>> 2011-09-06 16:48:23,738 INFO
> >> org.apache.hadoop.hdfs.server.**namenode.FSNamesystem:
> >>>>> Roll Edit Log from 10.10.10.11
> >>>>> 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage:
> >>>>> java.io.IOException: GetImage failed. java.lang.NullPointerException
> >>>>>      at
> >> org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(*
> >>>>> *FSImage.java:219)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.FSImage.**
> >>>>> getFsImageName(FSImage.java:**1584)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
> >>>>> run(GetImageServlet.java:75)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
> >>>>> run(GetImageServlet.java:70)
> >>>>>      at java.security.**AccessController.doPrivileged(**Native
> Method)
> >>>>>      at javax.security.auth.Subject.**doAs(Subject.java:396)
> >>>>>      at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> >>>>> UserGroupInformation.java:**1115)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.**
> >>>>> doGet(GetImageServlet.java:70)
> >>>>>      at javax.servlet.http.**HttpServlet.service(**
> >>>>> HttpServlet.java:707)
> >>>>>      at javax.servlet.http.**HttpServlet.service(**
> >>>>> HttpServlet.java:820)
> >>>>>      at org.mortbay.jetty.servlet.**ServletHolder.handle(**
> >>>>> ServletHolder.java:511)
> >>>>>      at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
> >>>>> doFilter(ServletHandler.java:**1221)
> >>>>>      at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.**
> >>>>> doFilter(HttpServer.java:824)
> >>>>>      at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
> >>>>> doFilter(ServletHandler.java:**1212)
> >>>>>      at org.mortbay.jetty.servlet.**ServletHandler.handle(**
> >>>>> ServletHandler.java:399)
> >>>>>      at org.mortbay.jetty.security.**SecurityHandler.handle(**
> >>>>> SecurityHandler.java:216)
> >>>>>      at org.mortbay.jetty.servlet.**SessionHandler.handle(**
> >>>>> SessionHandler.java:182)
> >>>>>      at org.mortbay.jetty.handler.**ContextHandler.handle(**
> >>>>> ContextHandler.java:766)
> >>>>>      at org.mortbay.jetty.webapp.**WebAppContext.handle(**
> >>>>> WebAppContext.java:450)
> >>>>>      at
> >> org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(*
> >>>>> *ContextHandlerCollection.java:**230)
> >>>>>      at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
> >>>>> HandlerWrapper.java:152)
> >>>>>      at org.mortbay.jetty.Server.**handle(Server.java:326)
> >>>>>      at org.mortbay.jetty.**HttpConnection.handleRequest(**
> >>>>> HttpConnection.java:542)
> >>>>>      at org.mortbay.jetty.**HttpConnection$RequestHandler.**
> >>>>> headerComplete(HttpConnection.**java:928)
> >>>>>      at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549)
> >>>>>      at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
> >>>>> java:212)
> >>>>>      at org.mortbay.jetty.**HttpConnection.handle(**
> >>>>> HttpConnection.java:404)
> >>>>>
> >>>>> On the secondary name node:
> >>>>>
> >>>>> 2011-09-06 16:51:53,538 ERROR
> >> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode:
> >>>>> java.io.FileNotFoundException: http://ftrr-nam6000.**
> >>>>> chestermcgee.com:50070/**getimage?getimage=1<
> >> http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1>
> >>>>>      at
> >> sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native
> >>>>> Method)
> >>>>>      at sun.reflect.**NativeConstructorAccessorImpl.**newInstance(**
> >>>>> NativeConstructorAccessorImpl.**java:39)
> >>>>>      at
> >> sun.reflect.**DelegatingConstructorAccessorI**mpl.newInstance(*
> >>>>> *DelegatingConstructorAccessorI**mpl.java:27)
> >>>>>      at
> >> java.lang.reflect.Constructor.**newInstance(Constructor.java:**
> >>>>> 513)
> >>>>>      at sun.net.www.protocol.http.**HttpURLConnection$6.run(**
> >>>>> HttpURLConnection.java:1360)
> >>>>>      at java.security.**AccessController.doPrivileged(**Native
> Method)
> >>>>>      at sun.net.www.protocol.http.**HttpURLConnection.**
> >>>>> getChainedException(**HttpURLConnection.java:1354)
> >>>>>      at
> >> sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
> >>>>> **HttpURLConnection.java:1008)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
> >>>>> getFileClient(TransferFsImage.**java:183)
> >>>>>      at
> >> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
> >>>>> run(SecondaryNameNode.java:**348)
> >>>>>      at
> >> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
> >>>>> run(SecondaryNameNode.java:**337)
> >>>>>      at java.security.**AccessController.doPrivileged(**Native
> Method)
> >>>>>      at javax.security.auth.Subject.**doAs(Subject.java:396)
> >>>>>      at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> >>>>> UserGroupInformation.java:**1115)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>>>> downloadCheckpointFiles(**SecondaryNameNode.java:337)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>>>> doCheckpoint(**SecondaryNameNode.java:422)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>>>> doWork(SecondaryNameNode.java:**313)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>>>> run(SecondaryNameNode.java:**276)
> >>>>>      at java.lang.Thread.run(Thread.**java:619)
> >>>>> Caused by: java.io.FileNotFoundException: http://ftrr-nam6000.las1.
> **
> >>>>> fanops.net:50070/getimage?**getimage=1<
> >> http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1>
> >>>>>      at
> >> sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
> >>>>> **HttpURLConnection.java:1303)
> >>>>>      at
> >> sun.net.www.protocol.http.**HttpURLConnection.**getHeaderField(
> >>>>> **HttpURLConnection.java:2165)
> >>>>>      at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
> >>>>> getFileClient(TransferFsImage.**java:175)
> >>>>>      ... 10 more
> >>>>>
> >>>>> Any help would be very much appreciated.  I'm scared to shut down the
> >> NN.
> >>>>> I've tried restarting the 2NN.
> >>>>>
> >>>>> Thank You
> >>>>> -jeremy
> >>>>>
> >>>>
> >>>>
> >>
> >>
>

Re: IMAGE_AND_EDITS Failed

Posted by Jeremy Hansen <je...@skidrow.la>.
Things still work in hdfs but the edits file is not being updated. Timestamp is sept 2nd. 

-jeremy

On Sep 7, 2011, at 9:45 AM, Ravi Prakash <ra...@gmail.com> wrote:

> If your HDFS is still working, the fsimage file won't be getting updated but
> the edits file still should. That's why I asked question 2.
> 
> On Wed, Sep 7, 2011 at 11:39 AM, Jeremy Hansen <je...@skidrow.la> wrote:
> 
>> The problem is that fsimage and edits are no longer being updated, so…if I
>> restart, how could it replay those?
>> 
>> -jeremy
>> 
>> 
>> On Sep 7, 2011, at 8:48 AM, Ravi Prakash wrote:
>> 
>>> Actually I take that back. Restarting the NN might not result in loss of
>>> data. It will probably just take longer to start up because it would read
>>> the fsimage, then apply the fsedits (rather than the SNN doing it).
>>> 
>>> On Wed, Sep 7, 2011 at 10:46 AM, Ravi Prakash <ra...@gmail.com>
>> wrote:
>>> 
>>>> Hi Jeremy,
>>>> 
>>>> Couple of questions:
>>>> 
>>>> 1. Which version of Hadoop are you using?
>>>> 2. If you write something into HDFS, can you subsequently read it?
>>>> 3. Are you sure your secondarynamenode configuration is correct? It
>> seems
>>>> like your SNN is telling your NN to roll the edit log (move the
>> journaling
>>>> directory from current to .new), but when it tries to download the image
>>>> file, its not finding it.
>>>> 3. I wish I could say I haven't ever seen that stack trace in the logs.
>> I
>>>> was seeing something similar (not the same, quite far from it actually)
>> (
>>>> https://issues.apache.org/jira/browse/HDFS-2011 ).
>>>> 
>>>> If I were you, and I felt exceptionally brave (mind you I've worked with
>>>> only test systems, no production sys-admin guts for me ;-) ) I would
>>>> probably do everything I can, to get the secondarynamenode started
>> properly
>>>> and make it checkpoint properly.
>>>> 
>>>> Me thinks restarting the namenode will most likely result in loss of
>> data.
>>>> 
>>>> Hope this helps
>>>> Ravi.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen <je...@skidrow.la>
>> wrote:
>>>> 
>>>>> 
>>>>> I happened to notice this today and being fairly new to administering
>>>>> hadoop, I'm not exactly sure how to pull out of this situation without
>> data
>>>>> loss.
>>>>> 
>>>>> The checkpoint hasn't happened since Sept 2nd.
>>>>> 
>>>>> -rw-r--r-- 1 hdfs hdfs        8889 Sep  2 14:09 edits
>>>>> -rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
>>>>> -rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
>>>>> -rw-r--r-- 1 hdfs hdfs           8 Sep  2 14:09 fstime
>>>>> -rw-r--r-- 1 hdfs hdfs         100 Sep  2 14:09 VERSION
>>>>> 
>>>>> /mnt/data0/dfs/nn/image
>>>>> -rw-r--r-- 1 hdfs hdfs    157 Sep  2 14:09 fsimage
>>>>> 
>>>>> I'm also seeing this in the NN logs:
>>>>> 
>>>>> 2011-09-06 16:48:23,738 INFO
>> org.apache.hadoop.hdfs.server.**namenode.FSNamesystem:
>>>>> Roll Edit Log from 10.10.10.11
>>>>> 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage:
>>>>> java.io.IOException: GetImage failed. java.lang.NullPointerException
>>>>>      at
>> org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(*
>>>>> *FSImage.java:219)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.FSImage.**
>>>>> getFsImageName(FSImage.java:**1584)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
>>>>> run(GetImageServlet.java:75)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
>>>>> run(GetImageServlet.java:70)
>>>>>      at java.security.**AccessController.doPrivileged(**Native Method)
>>>>>      at javax.security.auth.Subject.**doAs(Subject.java:396)
>>>>>      at org.apache.hadoop.security.**UserGroupInformation.doAs(**
>>>>> UserGroupInformation.java:**1115)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.**
>>>>> doGet(GetImageServlet.java:70)
>>>>>      at javax.servlet.http.**HttpServlet.service(**
>>>>> HttpServlet.java:707)
>>>>>      at javax.servlet.http.**HttpServlet.service(**
>>>>> HttpServlet.java:820)
>>>>>      at org.mortbay.jetty.servlet.**ServletHolder.handle(**
>>>>> ServletHolder.java:511)
>>>>>      at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
>>>>> doFilter(ServletHandler.java:**1221)
>>>>>      at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.**
>>>>> doFilter(HttpServer.java:824)
>>>>>      at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
>>>>> doFilter(ServletHandler.java:**1212)
>>>>>      at org.mortbay.jetty.servlet.**ServletHandler.handle(**
>>>>> ServletHandler.java:399)
>>>>>      at org.mortbay.jetty.security.**SecurityHandler.handle(**
>>>>> SecurityHandler.java:216)
>>>>>      at org.mortbay.jetty.servlet.**SessionHandler.handle(**
>>>>> SessionHandler.java:182)
>>>>>      at org.mortbay.jetty.handler.**ContextHandler.handle(**
>>>>> ContextHandler.java:766)
>>>>>      at org.mortbay.jetty.webapp.**WebAppContext.handle(**
>>>>> WebAppContext.java:450)
>>>>>      at
>> org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(*
>>>>> *ContextHandlerCollection.java:**230)
>>>>>      at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
>>>>> HandlerWrapper.java:152)
>>>>>      at org.mortbay.jetty.Server.**handle(Server.java:326)
>>>>>      at org.mortbay.jetty.**HttpConnection.handleRequest(**
>>>>> HttpConnection.java:542)
>>>>>      at org.mortbay.jetty.**HttpConnection$RequestHandler.**
>>>>> headerComplete(HttpConnection.**java:928)
>>>>>      at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549)
>>>>>      at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
>>>>> java:212)
>>>>>      at org.mortbay.jetty.**HttpConnection.handle(**
>>>>> HttpConnection.java:404)
>>>>> 
>>>>> On the secondary name node:
>>>>> 
>>>>> 2011-09-06 16:51:53,538 ERROR
>> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode:
>>>>> java.io.FileNotFoundException: http://ftrr-nam6000.**
>>>>> chestermcgee.com:50070/**getimage?getimage=1<
>> http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1>
>>>>>      at
>> sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native
>>>>> Method)
>>>>>      at sun.reflect.**NativeConstructorAccessorImpl.**newInstance(**
>>>>> NativeConstructorAccessorImpl.**java:39)
>>>>>      at
>> sun.reflect.**DelegatingConstructorAccessorI**mpl.newInstance(*
>>>>> *DelegatingConstructorAccessorI**mpl.java:27)
>>>>>      at
>> java.lang.reflect.Constructor.**newInstance(Constructor.java:**
>>>>> 513)
>>>>>      at sun.net.www.protocol.http.**HttpURLConnection$6.run(**
>>>>> HttpURLConnection.java:1360)
>>>>>      at java.security.**AccessController.doPrivileged(**Native Method)
>>>>>      at sun.net.www.protocol.http.**HttpURLConnection.**
>>>>> getChainedException(**HttpURLConnection.java:1354)
>>>>>      at
>> sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
>>>>> **HttpURLConnection.java:1008)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
>>>>> getFileClient(TransferFsImage.**java:183)
>>>>>      at
>> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
>>>>> run(SecondaryNameNode.java:**348)
>>>>>      at
>> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
>>>>> run(SecondaryNameNode.java:**337)
>>>>>      at java.security.**AccessController.doPrivileged(**Native Method)
>>>>>      at javax.security.auth.Subject.**doAs(Subject.java:396)
>>>>>      at org.apache.hadoop.security.**UserGroupInformation.doAs(**
>>>>> UserGroupInformation.java:**1115)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>>>>> downloadCheckpointFiles(**SecondaryNameNode.java:337)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>>>>> doCheckpoint(**SecondaryNameNode.java:422)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>>>>> doWork(SecondaryNameNode.java:**313)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>>>>> run(SecondaryNameNode.java:**276)
>>>>>      at java.lang.Thread.run(Thread.**java:619)
>>>>> Caused by: java.io.FileNotFoundException: http://ftrr-nam6000.las1.**
>>>>> fanops.net:50070/getimage?**getimage=1<
>> http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1>
>>>>>      at
>> sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
>>>>> **HttpURLConnection.java:1303)
>>>>>      at
>> sun.net.www.protocol.http.**HttpURLConnection.**getHeaderField(
>>>>> **HttpURLConnection.java:2165)
>>>>>      at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
>>>>> getFileClient(TransferFsImage.**java:175)
>>>>>      ... 10 more
>>>>> 
>>>>> Any help would be very much appreciated.  I'm scared to shut down the
>> NN.
>>>>> I've tried restarting the 2NN.
>>>>> 
>>>>> Thank You
>>>>> -jeremy
>>>>> 
>>>> 
>>>> 
>> 
>> 

Re: IMAGE_AND_EDITS Failed

Posted by Ravi Prakash <ra...@gmail.com>.
If your HDFS is still working, the fsimage file won't be getting updated but
the edits file still should. That's why I asked question 2.

On Wed, Sep 7, 2011 at 11:39 AM, Jeremy Hansen <je...@skidrow.la> wrote:

> The problem is that fsimage and edits are no longer being updated, so…if I
> restart, how could it replay those?
>
> -jeremy
>
>
> On Sep 7, 2011, at 8:48 AM, Ravi Prakash wrote:
>
> > Actually I take that back. Restarting the NN might not result in loss of
> > data. It will probably just take longer to start up because it would read
> > the fsimage, then apply the fsedits (rather than the SNN doing it).
> >
> > On Wed, Sep 7, 2011 at 10:46 AM, Ravi Prakash <ra...@gmail.com>
> wrote:
> >
> >> Hi Jeremy,
> >>
> >> Couple of questions:
> >>
> >> 1. Which version of Hadoop are you using?
> >> 2. If you write something into HDFS, can you subsequently read it?
> >> 3. Are you sure your secondarynamenode configuration is correct? It
> seems
> >> like your SNN is telling your NN to roll the edit log (move the
> journaling
> >> directory from current to .new), but when it tries to download the image
> >> file, its not finding it.
> >> 3. I wish I could say I haven't ever seen that stack trace in the logs.
> I
> >> was seeing something similar (not the same, quite far from it actually)
> (
> >> https://issues.apache.org/jira/browse/HDFS-2011 ).
> >>
> >> If I were you, and I felt exceptionally brave (mind you I've worked with
> >> only test systems, no production sys-admin guts for me ;-) ) I would
> >> probably do everything I can, to get the secondarynamenode started
> properly
> >> and make it checkpoint properly.
> >>
> >> Me thinks restarting the namenode will most likely result in loss of
> data.
> >>
> >> Hope this helps
> >> Ravi.
> >>
> >>
> >>
> >>
> >> On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen <je...@skidrow.la>
> wrote:
> >>
> >>>
> >>> I happened to notice this today and being fairly new to administering
> >>> hadoop, I'm not exactly sure how to pull out of this situation without
> data
> >>> loss.
> >>>
> >>> The checkpoint hasn't happened since Sept 2nd.
> >>>
> >>> -rw-r--r-- 1 hdfs hdfs        8889 Sep  2 14:09 edits
> >>> -rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
> >>> -rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
> >>> -rw-r--r-- 1 hdfs hdfs           8 Sep  2 14:09 fstime
> >>> -rw-r--r-- 1 hdfs hdfs         100 Sep  2 14:09 VERSION
> >>>
> >>> /mnt/data0/dfs/nn/image
> >>> -rw-r--r-- 1 hdfs hdfs    157 Sep  2 14:09 fsimage
> >>>
> >>> I'm also seeing this in the NN logs:
> >>>
> >>> 2011-09-06 16:48:23,738 INFO
> org.apache.hadoop.hdfs.server.**namenode.FSNamesystem:
> >>> Roll Edit Log from 10.10.10.11
> >>> 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage:
> >>> java.io.IOException: GetImage failed. java.lang.NullPointerException
> >>>       at
> org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(*
> >>> *FSImage.java:219)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.FSImage.**
> >>> getFsImageName(FSImage.java:**1584)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
> >>> run(GetImageServlet.java:75)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
> >>> run(GetImageServlet.java:70)
> >>>       at java.security.**AccessController.doPrivileged(**Native Method)
> >>>       at javax.security.auth.Subject.**doAs(Subject.java:396)
> >>>       at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> >>> UserGroupInformation.java:**1115)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.**
> >>> doGet(GetImageServlet.java:70)
> >>>       at javax.servlet.http.**HttpServlet.service(**
> >>> HttpServlet.java:707)
> >>>       at javax.servlet.http.**HttpServlet.service(**
> >>> HttpServlet.java:820)
> >>>       at org.mortbay.jetty.servlet.**ServletHolder.handle(**
> >>> ServletHolder.java:511)
> >>>       at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
> >>> doFilter(ServletHandler.java:**1221)
> >>>       at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.**
> >>> doFilter(HttpServer.java:824)
> >>>       at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
> >>> doFilter(ServletHandler.java:**1212)
> >>>       at org.mortbay.jetty.servlet.**ServletHandler.handle(**
> >>> ServletHandler.java:399)
> >>>       at org.mortbay.jetty.security.**SecurityHandler.handle(**
> >>> SecurityHandler.java:216)
> >>>       at org.mortbay.jetty.servlet.**SessionHandler.handle(**
> >>> SessionHandler.java:182)
> >>>       at org.mortbay.jetty.handler.**ContextHandler.handle(**
> >>> ContextHandler.java:766)
> >>>       at org.mortbay.jetty.webapp.**WebAppContext.handle(**
> >>> WebAppContext.java:450)
> >>>       at
> org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(*
> >>> *ContextHandlerCollection.java:**230)
> >>>       at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
> >>> HandlerWrapper.java:152)
> >>>       at org.mortbay.jetty.Server.**handle(Server.java:326)
> >>>       at org.mortbay.jetty.**HttpConnection.handleRequest(**
> >>> HttpConnection.java:542)
> >>>       at org.mortbay.jetty.**HttpConnection$RequestHandler.**
> >>> headerComplete(HttpConnection.**java:928)
> >>>       at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549)
> >>>       at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
> >>> java:212)
> >>>       at org.mortbay.jetty.**HttpConnection.handle(**
> >>> HttpConnection.java:404)
> >>>
> >>> On the secondary name node:
> >>>
> >>> 2011-09-06 16:51:53,538 ERROR
> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode:
> >>> java.io.FileNotFoundException: http://ftrr-nam6000.**
> >>> chestermcgee.com:50070/**getimage?getimage=1<
> http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1>
> >>>       at
> sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native
> >>> Method)
> >>>       at sun.reflect.**NativeConstructorAccessorImpl.**newInstance(**
> >>> NativeConstructorAccessorImpl.**java:39)
> >>>       at
> sun.reflect.**DelegatingConstructorAccessorI**mpl.newInstance(*
> >>> *DelegatingConstructorAccessorI**mpl.java:27)
> >>>       at
> java.lang.reflect.Constructor.**newInstance(Constructor.java:**
> >>> 513)
> >>>       at sun.net.www.protocol.http.**HttpURLConnection$6.run(**
> >>> HttpURLConnection.java:1360)
> >>>       at java.security.**AccessController.doPrivileged(**Native Method)
> >>>       at sun.net.www.protocol.http.**HttpURLConnection.**
> >>> getChainedException(**HttpURLConnection.java:1354)
> >>>       at
> sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
> >>> **HttpURLConnection.java:1008)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
> >>> getFileClient(TransferFsImage.**java:183)
> >>>       at
> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
> >>> run(SecondaryNameNode.java:**348)
> >>>       at
> org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
> >>> run(SecondaryNameNode.java:**337)
> >>>       at java.security.**AccessController.doPrivileged(**Native Method)
> >>>       at javax.security.auth.Subject.**doAs(Subject.java:396)
> >>>       at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> >>> UserGroupInformation.java:**1115)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>> downloadCheckpointFiles(**SecondaryNameNode.java:337)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>> doCheckpoint(**SecondaryNameNode.java:422)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>> doWork(SecondaryNameNode.java:**313)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> >>> run(SecondaryNameNode.java:**276)
> >>>       at java.lang.Thread.run(Thread.**java:619)
> >>> Caused by: java.io.FileNotFoundException: http://ftrr-nam6000.las1.**
> >>> fanops.net:50070/getimage?**getimage=1<
> http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1>
> >>>       at
> sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
> >>> **HttpURLConnection.java:1303)
> >>>       at
> sun.net.www.protocol.http.**HttpURLConnection.**getHeaderField(
> >>> **HttpURLConnection.java:2165)
> >>>       at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
> >>> getFileClient(TransferFsImage.**java:175)
> >>>       ... 10 more
> >>>
> >>> Any help would be very much appreciated.  I'm scared to shut down the
> NN.
> >>> I've tried restarting the 2NN.
> >>>
> >>> Thank You
> >>> -jeremy
> >>>
> >>
> >>
>
>

Re: IMAGE_AND_EDITS Failed

Posted by Jeremy Hansen <je...@skidrow.la>.
The problem is that fsimage and edits are no longer being updated, so…if I restart, how could it replay those?

-jeremy


On Sep 7, 2011, at 8:48 AM, Ravi Prakash wrote:

> Actually I take that back. Restarting the NN might not result in loss of
> data. It will probably just take longer to start up because it would read
> the fsimage, then apply the fsedits (rather than the SNN doing it).
> 
> On Wed, Sep 7, 2011 at 10:46 AM, Ravi Prakash <ra...@gmail.com> wrote:
> 
>> Hi Jeremy,
>> 
>> Couple of questions:
>> 
>> 1. Which version of Hadoop are you using?
>> 2. If you write something into HDFS, can you subsequently read it?
>> 3. Are you sure your secondarynamenode configuration is correct? It seems
>> like your SNN is telling your NN to roll the edit log (move the journaling
>> directory from current to .new), but when it tries to download the image
>> file, its not finding it.
>> 3. I wish I could say I haven't ever seen that stack trace in the logs. I
>> was seeing something similar (not the same, quite far from it actually) (
>> https://issues.apache.org/jira/browse/HDFS-2011 ).
>> 
>> If I were you, and I felt exceptionally brave (mind you I've worked with
>> only test systems, no production sys-admin guts for me ;-) ) I would
>> probably do everything I can, to get the secondarynamenode started properly
>> and make it checkpoint properly.
>> 
>> Me thinks restarting the namenode will most likely result in loss of data.
>> 
>> Hope this helps
>> Ravi.
>> 
>> 
>> 
>> 
>> On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen <je...@skidrow.la> wrote:
>> 
>>> 
>>> I happened to notice this today and being fairly new to administering
>>> hadoop, I'm not exactly sure how to pull out of this situation without data
>>> loss.
>>> 
>>> The checkpoint hasn't happened since Sept 2nd.
>>> 
>>> -rw-r--r-- 1 hdfs hdfs        8889 Sep  2 14:09 edits
>>> -rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
>>> -rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
>>> -rw-r--r-- 1 hdfs hdfs           8 Sep  2 14:09 fstime
>>> -rw-r--r-- 1 hdfs hdfs         100 Sep  2 14:09 VERSION
>>> 
>>> /mnt/data0/dfs/nn/image
>>> -rw-r--r-- 1 hdfs hdfs    157 Sep  2 14:09 fsimage
>>> 
>>> I'm also seeing this in the NN logs:
>>> 
>>> 2011-09-06 16:48:23,738 INFO org.apache.hadoop.hdfs.server.**namenode.FSNamesystem:
>>> Roll Edit Log from 10.10.10.11
>>> 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage:
>>> java.io.IOException: GetImage failed. java.lang.NullPointerException
>>>       at org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(*
>>> *FSImage.java:219)
>>>       at org.apache.hadoop.hdfs.server.**namenode.FSImage.**
>>> getFsImageName(FSImage.java:**1584)
>>>       at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
>>> run(GetImageServlet.java:75)
>>>       at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
>>> run(GetImageServlet.java:70)
>>>       at java.security.**AccessController.doPrivileged(**Native Method)
>>>       at javax.security.auth.Subject.**doAs(Subject.java:396)
>>>       at org.apache.hadoop.security.**UserGroupInformation.doAs(**
>>> UserGroupInformation.java:**1115)
>>>       at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.**
>>> doGet(GetImageServlet.java:70)
>>>       at javax.servlet.http.**HttpServlet.service(**
>>> HttpServlet.java:707)
>>>       at javax.servlet.http.**HttpServlet.service(**
>>> HttpServlet.java:820)
>>>       at org.mortbay.jetty.servlet.**ServletHolder.handle(**
>>> ServletHolder.java:511)
>>>       at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
>>> doFilter(ServletHandler.java:**1221)
>>>       at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.**
>>> doFilter(HttpServer.java:824)
>>>       at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
>>> doFilter(ServletHandler.java:**1212)
>>>       at org.mortbay.jetty.servlet.**ServletHandler.handle(**
>>> ServletHandler.java:399)
>>>       at org.mortbay.jetty.security.**SecurityHandler.handle(**
>>> SecurityHandler.java:216)
>>>       at org.mortbay.jetty.servlet.**SessionHandler.handle(**
>>> SessionHandler.java:182)
>>>       at org.mortbay.jetty.handler.**ContextHandler.handle(**
>>> ContextHandler.java:766)
>>>       at org.mortbay.jetty.webapp.**WebAppContext.handle(**
>>> WebAppContext.java:450)
>>>       at org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(*
>>> *ContextHandlerCollection.java:**230)
>>>       at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
>>> HandlerWrapper.java:152)
>>>       at org.mortbay.jetty.Server.**handle(Server.java:326)
>>>       at org.mortbay.jetty.**HttpConnection.handleRequest(**
>>> HttpConnection.java:542)
>>>       at org.mortbay.jetty.**HttpConnection$RequestHandler.**
>>> headerComplete(HttpConnection.**java:928)
>>>       at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549)
>>>       at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
>>> java:212)
>>>       at org.mortbay.jetty.**HttpConnection.handle(**
>>> HttpConnection.java:404)
>>> 
>>> On the secondary name node:
>>> 
>>> 2011-09-06 16:51:53,538 ERROR org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode:
>>> java.io.FileNotFoundException: http://ftrr-nam6000.**
>>> chestermcgee.com:50070/**getimage?getimage=1<http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1>
>>>       at sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native
>>> Method)
>>>       at sun.reflect.**NativeConstructorAccessorImpl.**newInstance(**
>>> NativeConstructorAccessorImpl.**java:39)
>>>       at sun.reflect.**DelegatingConstructorAccessorI**mpl.newInstance(*
>>> *DelegatingConstructorAccessorI**mpl.java:27)
>>>       at java.lang.reflect.Constructor.**newInstance(Constructor.java:**
>>> 513)
>>>       at sun.net.www.protocol.http.**HttpURLConnection$6.run(**
>>> HttpURLConnection.java:1360)
>>>       at java.security.**AccessController.doPrivileged(**Native Method)
>>>       at sun.net.www.protocol.http.**HttpURLConnection.**
>>> getChainedException(**HttpURLConnection.java:1354)
>>>       at sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
>>> **HttpURLConnection.java:1008)
>>>       at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
>>> getFileClient(TransferFsImage.**java:183)
>>>       at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
>>> run(SecondaryNameNode.java:**348)
>>>       at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
>>> run(SecondaryNameNode.java:**337)
>>>       at java.security.**AccessController.doPrivileged(**Native Method)
>>>       at javax.security.auth.Subject.**doAs(Subject.java:396)
>>>       at org.apache.hadoop.security.**UserGroupInformation.doAs(**
>>> UserGroupInformation.java:**1115)
>>>       at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>>> downloadCheckpointFiles(**SecondaryNameNode.java:337)
>>>       at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>>> doCheckpoint(**SecondaryNameNode.java:422)
>>>       at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>>> doWork(SecondaryNameNode.java:**313)
>>>       at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>>> run(SecondaryNameNode.java:**276)
>>>       at java.lang.Thread.run(Thread.**java:619)
>>> Caused by: java.io.FileNotFoundException: http://ftrr-nam6000.las1.**
>>> fanops.net:50070/getimage?**getimage=1<http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1>
>>>       at sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
>>> **HttpURLConnection.java:1303)
>>>       at sun.net.www.protocol.http.**HttpURLConnection.**getHeaderField(
>>> **HttpURLConnection.java:2165)
>>>       at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
>>> getFileClient(TransferFsImage.**java:175)
>>>       ... 10 more
>>> 
>>> Any help would be very much appreciated.  I'm scared to shut down the NN.
>>> I've tried restarting the 2NN.
>>> 
>>> Thank You
>>> -jeremy
>>> 
>> 
>> 


Re: IMAGE_AND_EDITS Failed

Posted by Ravi Prakash <ra...@gmail.com>.
Actually I take that back. Restarting the NN might not result in loss of
data. It will probably just take longer to start up because it would read
the fsimage, then apply the fsedits (rather than the SNN doing it).

On Wed, Sep 7, 2011 at 10:46 AM, Ravi Prakash <ra...@gmail.com> wrote:

> Hi Jeremy,
>
> Couple of questions:
>
> 1. Which version of Hadoop are you using?
> 2. If you write something into HDFS, can you subsequently read it?
> 3. Are you sure your secondarynamenode configuration is correct? It seems
> like your SNN is telling your NN to roll the edit log (move the journaling
> directory from current to .new), but when it tries to download the image
> file, its not finding it.
> 3. I wish I could say I haven't ever seen that stack trace in the logs. I
> was seeing something similar (not the same, quite far from it actually) (
> https://issues.apache.org/jira/browse/HDFS-2011 ).
>
> If I were you, and I felt exceptionally brave (mind you I've worked with
> only test systems, no production sys-admin guts for me ;-) ) I would
> probably do everything I can, to get the secondarynamenode started properly
> and make it checkpoint properly.
>
> Me thinks restarting the namenode will most likely result in loss of data.
>
> Hope this helps
> Ravi.
>
>
>
>
> On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen <je...@skidrow.la> wrote:
>
>>
>> I happened to notice this today and being fairly new to administering
>> hadoop, I'm not exactly sure how to pull out of this situation without data
>> loss.
>>
>> The checkpoint hasn't happened since Sept 2nd.
>>
>> -rw-r--r-- 1 hdfs hdfs        8889 Sep  2 14:09 edits
>> -rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
>> -rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
>> -rw-r--r-- 1 hdfs hdfs           8 Sep  2 14:09 fstime
>> -rw-r--r-- 1 hdfs hdfs         100 Sep  2 14:09 VERSION
>>
>> /mnt/data0/dfs/nn/image
>> -rw-r--r-- 1 hdfs hdfs    157 Sep  2 14:09 fsimage
>>
>> I'm also seeing this in the NN logs:
>>
>> 2011-09-06 16:48:23,738 INFO org.apache.hadoop.hdfs.server.**namenode.FSNamesystem:
>> Roll Edit Log from 10.10.10.11
>> 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage:
>> java.io.IOException: GetImage failed. java.lang.NullPointerException
>>        at org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(*
>> *FSImage.java:219)
>>        at org.apache.hadoop.hdfs.server.**namenode.FSImage.**
>> getFsImageName(FSImage.java:**1584)
>>        at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
>> run(GetImageServlet.java:75)
>>        at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
>> run(GetImageServlet.java:70)
>>        at java.security.**AccessController.doPrivileged(**Native Method)
>>        at javax.security.auth.Subject.**doAs(Subject.java:396)
>>        at org.apache.hadoop.security.**UserGroupInformation.doAs(**
>> UserGroupInformation.java:**1115)
>>        at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.**
>> doGet(GetImageServlet.java:70)
>>        at javax.servlet.http.**HttpServlet.service(**
>> HttpServlet.java:707)
>>        at javax.servlet.http.**HttpServlet.service(**
>> HttpServlet.java:820)
>>        at org.mortbay.jetty.servlet.**ServletHolder.handle(**
>> ServletHolder.java:511)
>>        at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
>> doFilter(ServletHandler.java:**1221)
>>        at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.**
>> doFilter(HttpServer.java:824)
>>        at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
>> doFilter(ServletHandler.java:**1212)
>>        at org.mortbay.jetty.servlet.**ServletHandler.handle(**
>> ServletHandler.java:399)
>>        at org.mortbay.jetty.security.**SecurityHandler.handle(**
>> SecurityHandler.java:216)
>>        at org.mortbay.jetty.servlet.**SessionHandler.handle(**
>> SessionHandler.java:182)
>>        at org.mortbay.jetty.handler.**ContextHandler.handle(**
>> ContextHandler.java:766)
>>        at org.mortbay.jetty.webapp.**WebAppContext.handle(**
>> WebAppContext.java:450)
>>        at org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(*
>> *ContextHandlerCollection.java:**230)
>>        at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
>> HandlerWrapper.java:152)
>>        at org.mortbay.jetty.Server.**handle(Server.java:326)
>>        at org.mortbay.jetty.**HttpConnection.handleRequest(**
>> HttpConnection.java:542)
>>        at org.mortbay.jetty.**HttpConnection$RequestHandler.**
>> headerComplete(HttpConnection.**java:928)
>>        at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549)
>>        at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
>> java:212)
>>        at org.mortbay.jetty.**HttpConnection.handle(**
>> HttpConnection.java:404)
>>
>> On the secondary name node:
>>
>> 2011-09-06 16:51:53,538 ERROR org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode:
>> java.io.FileNotFoundException: http://ftrr-nam6000.**
>> chestermcgee.com:50070/**getimage?getimage=1<http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1>
>>        at sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native
>> Method)
>>        at sun.reflect.**NativeConstructorAccessorImpl.**newInstance(**
>> NativeConstructorAccessorImpl.**java:39)
>>        at sun.reflect.**DelegatingConstructorAccessorI**mpl.newInstance(*
>> *DelegatingConstructorAccessorI**mpl.java:27)
>>        at java.lang.reflect.Constructor.**newInstance(Constructor.java:**
>> 513)
>>        at sun.net.www.protocol.http.**HttpURLConnection$6.run(**
>> HttpURLConnection.java:1360)
>>        at java.security.**AccessController.doPrivileged(**Native Method)
>>        at sun.net.www.protocol.http.**HttpURLConnection.**
>> getChainedException(**HttpURLConnection.java:1354)
>>        at sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
>> **HttpURLConnection.java:1008)
>>        at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
>> getFileClient(TransferFsImage.**java:183)
>>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
>> run(SecondaryNameNode.java:**348)
>>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
>> run(SecondaryNameNode.java:**337)
>>        at java.security.**AccessController.doPrivileged(**Native Method)
>>        at javax.security.auth.Subject.**doAs(Subject.java:396)
>>        at org.apache.hadoop.security.**UserGroupInformation.doAs(**
>> UserGroupInformation.java:**1115)
>>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>> downloadCheckpointFiles(**SecondaryNameNode.java:337)
>>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>> doCheckpoint(**SecondaryNameNode.java:422)
>>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>> doWork(SecondaryNameNode.java:**313)
>>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
>> run(SecondaryNameNode.java:**276)
>>        at java.lang.Thread.run(Thread.**java:619)
>> Caused by: java.io.FileNotFoundException: http://ftrr-nam6000.las1.**
>> fanops.net:50070/getimage?**getimage=1<http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1>
>>        at sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(
>> **HttpURLConnection.java:1303)
>>        at sun.net.www.protocol.http.**HttpURLConnection.**getHeaderField(
>> **HttpURLConnection.java:2165)
>>        at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
>> getFileClient(TransferFsImage.**java:175)
>>        ... 10 more
>>
>> Any help would be very much appreciated.  I'm scared to shut down the NN.
>>  I've tried restarting the 2NN.
>>
>> Thank You
>> -jeremy
>>
>
>

Re: IMAGE_AND_EDITS Failed

Posted by Ravi Prakash <ra...@gmail.com>.
Hi Jeremy,

Couple of questions:

1. Which version of Hadoop are you using?
2. If you write something into HDFS, can you subsequently read it?
3. Are you sure your secondarynamenode configuration is correct? It seems
like your SNN is telling your NN to roll the edit log (move the journaling
directory from current to .new), but when it tries to download the image
file, its not finding it.
3. I wish I could say I haven't ever seen that stack trace in the logs. I
was seeing something similar (not the same, quite far from it actually) (
https://issues.apache.org/jira/browse/HDFS-2011 ).

If I were you, and I felt exceptionally brave (mind you I've worked with
only test systems, no production sys-admin guts for me ;-) ) I would
probably do everything I can, to get the secondarynamenode started properly
and make it checkpoint properly.

Me thinks restarting the namenode will most likely result in loss of data.

Hope this helps
Ravi.



On Tue, Sep 6, 2011 at 7:26 PM, Jeremy Hansen <je...@skidrow.la> wrote:

>
> I happened to notice this today and being fairly new to administering
> hadoop, I'm not exactly sure how to pull out of this situation without data
> loss.
>
> The checkpoint hasn't happened since Sept 2nd.
>
> -rw-r--r-- 1 hdfs hdfs        8889 Sep  2 14:09 edits
> -rw-r--r-- 1 hdfs hdfs   195968056 Sep  2 14:09 fsimage
> -rw-r--r-- 1 hdfs hdfs   195979439 Sep  2 14:09 fsimage.ckpt
> -rw-r--r-- 1 hdfs hdfs           8 Sep  2 14:09 fstime
> -rw-r--r-- 1 hdfs hdfs         100 Sep  2 14:09 VERSION
>
> /mnt/data0/dfs/nn/image
> -rw-r--r-- 1 hdfs hdfs    157 Sep  2 14:09 fsimage
>
> I'm also seeing this in the NN logs:
>
> 2011-09-06 16:48:23,738 INFO org.apache.hadoop.hdfs.server.**namenode.FSNamesystem:
> Roll Edit Log from 10.10.10.11
> 2011-09-06 16:48:23,740 WARN org.mortbay.log: /getimage:
> java.io.IOException: GetImage failed. java.lang.NullPointerException
>        at org.apache.hadoop.hdfs.server.**namenode.FSImage.getImageFile(**
> FSImage.java:219)
>        at org.apache.hadoop.hdfs.server.**namenode.FSImage.**
> getFsImageName(FSImage.java:**1584)
>        at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
> run(GetImageServlet.java:75)
>        at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet$1.**
> run(GetImageServlet.java:70)
>        at java.security.**AccessController.doPrivileged(**Native Method)
>        at javax.security.auth.Subject.**doAs(Subject.java:396)
>        at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> UserGroupInformation.java:**1115)
>        at org.apache.hadoop.hdfs.server.**namenode.GetImageServlet.**
> doGet(GetImageServlet.java:70)
>        at javax.servlet.http.**HttpServlet.service(**HttpServlet.java:707)
>        at javax.servlet.http.**HttpServlet.service(**HttpServlet.java:820)
>        at org.mortbay.jetty.servlet.**ServletHolder.handle(**
> ServletHolder.java:511)
>        at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
> doFilter(ServletHandler.java:**1221)
>        at org.apache.hadoop.http.**HttpServer$QuotingInputFilter.**
> doFilter(HttpServer.java:824)
>        at org.mortbay.jetty.servlet.**ServletHandler$CachedChain.**
> doFilter(ServletHandler.java:**1212)
>        at org.mortbay.jetty.servlet.**ServletHandler.handle(**
> ServletHandler.java:399)
>        at org.mortbay.jetty.security.**SecurityHandler.handle(**
> SecurityHandler.java:216)
>        at org.mortbay.jetty.servlet.**SessionHandler.handle(**
> SessionHandler.java:182)
>        at org.mortbay.jetty.handler.**ContextHandler.handle(**
> ContextHandler.java:766)
>        at org.mortbay.jetty.webapp.**WebAppContext.handle(**
> WebAppContext.java:450)
>        at org.mortbay.jetty.handler.**ContextHandlerCollection.**handle(**
> ContextHandlerCollection.java:**230)
>        at org.mortbay.jetty.handler.**HandlerWrapper.handle(**
> HandlerWrapper.java:152)
>        at org.mortbay.jetty.Server.**handle(Server.java:326)
>        at org.mortbay.jetty.**HttpConnection.handleRequest(**
> HttpConnection.java:542)
>        at org.mortbay.jetty.**HttpConnection$RequestHandler.**
> headerComplete(HttpConnection.**java:928)
>        at org.mortbay.jetty.HttpParser.**parseNext(HttpParser.java:549)
>        at org.mortbay.jetty.HttpParser.**parseAvailable(HttpParser.**
> java:212)
>        at org.mortbay.jetty.**HttpConnection.handle(**
> HttpConnection.java:404)
>
> On the secondary name node:
>
> 2011-09-06 16:51:53,538 ERROR org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode:
> java.io.FileNotFoundException: http://ftrr-nam6000.**
> chestermcgee.com:50070/**getimage?getimage=1<http://ftrr-nam6000.chestermcgee.com:50070/getimage?getimage=1>
>        at sun.reflect.**NativeConstructorAccessorImpl.**newInstance0(Native
> Method)
>        at sun.reflect.**NativeConstructorAccessorImpl.**newInstance(**
> NativeConstructorAccessorImpl.**java:39)
>        at sun.reflect.**DelegatingConstructorAccessorI**mpl.newInstance(**
> DelegatingConstructorAccessorI**mpl.java:27)
>        at java.lang.reflect.Constructor.**newInstance(Constructor.java:**
> 513)
>        at sun.net.www.protocol.http.**HttpURLConnection$6.run(**
> HttpURLConnection.java:1360)
>        at java.security.**AccessController.doPrivileged(**Native Method)
>        at sun.net.www.protocol.http.**HttpURLConnection.**
> getChainedException(**HttpURLConnection.java:1354)
>        at sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(*
> *HttpURLConnection.java:1008)
>        at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
> getFileClient(TransferFsImage.**java:183)
>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
> run(SecondaryNameNode.java:**348)
>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode$3.**
> run(SecondaryNameNode.java:**337)
>        at java.security.**AccessController.doPrivileged(**Native Method)
>        at javax.security.auth.Subject.**doAs(Subject.java:396)
>        at org.apache.hadoop.security.**UserGroupInformation.doAs(**
> UserGroupInformation.java:**1115)
>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> downloadCheckpointFiles(**SecondaryNameNode.java:337)
>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> doCheckpoint(**SecondaryNameNode.java:422)
>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> doWork(SecondaryNameNode.java:**313)
>        at org.apache.hadoop.hdfs.server.**namenode.SecondaryNameNode.**
> run(SecondaryNameNode.java:**276)
>        at java.lang.Thread.run(Thread.**java:619)
> Caused by: java.io.FileNotFoundException: http://ftrr-nam6000.las1.**
> fanops.net:50070/getimage?**getimage=1<http://ftrr-nam6000.las1.fanops.net:50070/getimage?getimage=1>
>        at sun.net.www.protocol.http.**HttpURLConnection.**getInputStream(*
> *HttpURLConnection.java:1303)
>        at sun.net.www.protocol.http.**HttpURLConnection.**getHeaderField(*
> *HttpURLConnection.java:2165)
>        at org.apache.hadoop.hdfs.server.**namenode.TransferFsImage.**
> getFileClient(TransferFsImage.**java:175)
>        ... 10 more
>
> Any help would be very much appreciated.  I'm scared to shut down the NN.
>  I've tried restarting the 2NN.
>
> Thank You
> -jeremy
>