You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jonathan Bender <jo...@gmail.com> on 2011/04/27 01:53:21 UTC

HDFS reports corrupted blocks after HBase reinstall

Hi all, I'm having a strange error which I can't exactly figure out.

After wiping my /hbase HDFS directory to do a fresh install, I am getting
"MISSING BLOCKS" in this /hbase directory, which cause HDFS to start up in
safe mode.  This doesn't happen until I start my region servers, so I have a
feeling there is some kind of corrupted metadata that is being loaded from
these region servers.

Is there a graceful way to wipe the HBase directory clean?  Any local
directories on the region servers /master / ZK server that I should be
wiping as well?

Cheers,
Jon

Re: HDFS reports corrupted blocks after HBase reinstall

Posted by Jean-Daniel Cryans <jd...@apache.org>.

I don't remember ever seeing this :|

Was your secondary namenode running on a different host or storing its
data in a different folder? Was that wiped out too?

J-D

On Wed, Apr 27, 2011 at 8:28 AM, Jonathan Bender
<jo...@gmail.com> wrote:
> So it's definitely a case of HDFS not being able to recover the image.
>  Maybe this is better directed toward another list, but has anyone had
> issues with this, or any suggestions for trying to eradicate this?
>
>
>
>
> 2011-04-26 17:15:56,898 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Recovering storage directory /var/lib/hadoop-0.20/cache/hadoop/dfs/name from
> failed checkpoint.
> 2011-04-26 17:15:56,905 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files = 204
> 2011-04-26 17:15:57,020 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files under construction = 0
> 2011-04-26 17:15:57,021 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Image file of size 26833 loaded in 0 seconds.
> 2011-04-26 17:15:57,257 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode, reached
> end of edit log Number of transactions found 528
> 2011-04-26 17:15:57,258 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Edits file /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/edits of size
> 1049092 edits # 528 loaded in 0 seconds.
> 2011-04-26 17:15:57,265 ERROR org.apache.hadoop.hdfs.server.common.Storage:
> Unable to save image for /var/lib/hadoop-0.20/cache/hadoop/dfs/name
> java.io.IOException: saveLeases found path /hbase/base_tmp/.logs/
> sv004.my.domain.com,60020,1302882411768/sv004.my.domain.com%3A60020.1302882412951
> but no matching entry in namespace.
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:5153)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1071)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1170)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:1118)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:347)
>        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:321)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:267)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:461)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1202)
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1211)
> 2011-04-26 17:15:57,273 WARN org.apache.hadoop.hdfs.server.common.Storage:
> FSImage:processIOError: removing storage:
> /var/lib/hadoop-0.20/cache/hadoop/dfs/name
> 2011-04-26 17:15:57,274 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
> FSImage in 1553 msecs
>
>
>
>
> On Tue, Apr 26, 2011 at 5:19 PM, Jonathan Bender
> <jo...@gmail.com>wrote:
>
>> Wow, this is more intense than I thought...as soon as I load HBase again,
>> my HDFS filesystem reverts back to an older snapshot essentially.  As in, I
>> don't see any of the changes I had made since that time, in the hbase table
>> or otherwise.
>>
>> I'm using CDH3 beta 4, which I believe stores its local hbase data in a
>> different directory--not entirely sure where though.
>>
>> I'm not entirely sure what happened to mess this up, but it seems pretty
>> serious.
>>
>> On Tue, Apr 26, 2011 at 5:07 PM, Himanshu Vashishtha <
>> hvashish@cs.ualberta.ca> wrote:
>>
>>> Could it be the /tmp/hbase-<userID> directory that is playing the culprit.
>>> just a wild guess though.
>>>
>>>
>>> On Tue, Apr 26, 2011 at 5:56 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>>>
>>>> Unless HBase was running when you wiped that out (and even then), I
>>>> don't see how this could happen. Could you match those blocks to the
>>>> files using fsck and figure when the files were created and if they
>>>> were part of the old install?
>>>>
>>>> Thx,
>>>>
>>>> J-D
>>>>
>>>> On Tue, Apr 26, 2011 at 4:53 PM, Jonathan Bender
>>>> <jo...@gmail.com> wrote:
>>>> > Hi all, I'm having a strange error which I can't exactly figure out.
>>>> >
>>>> > After wiping my /hbase HDFS directory to do a fresh install, I am
>>>> getting
>>>> > "MISSING BLOCKS" in this /hbase directory, which cause HDFS to start up
>>>> in
>>>> > safe mode.  This doesn't happen until I start my region servers, so I
>>>> have a
>>>> > feeling there is some kind of corrupted metadata that is being loaded
>>>> from
>>>> > these region servers.
>>>> >
>>>> > Is there a graceful way to wipe the HBase directory clean?  Any local
>>>> > directories on the region servers /master / ZK server that I should be
>>>> > wiping as well?
>>>> >
>>>> > Cheers,
>>>> > Jon
>>>> >
>>>>
>>>
>>>
>>
>

Re: HDFS reports corrupted blocks after HBase reinstall

Posted by Jonathan Bender <jo...@gmail.com>.

So it's definitely a case of HDFS not being able to recover the image.
 Maybe this is better directed toward another list, but has anyone had
issues with this, or any suggestions for trying to eradicate this?




2011-04-26 17:15:56,898 INFO org.apache.hadoop.hdfs.server.common.Storage:
Recovering storage directory /var/lib/hadoop-0.20/cache/hadoop/dfs/name from
failed checkpoint.
2011-04-26 17:15:56,905 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 204
2011-04-26 17:15:57,020 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 0
2011-04-26 17:15:57,021 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 26833 loaded in 0 seconds.
2011-04-26 17:15:57,257 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Invalid opcode, reached
end of edit log Number of transactions found 528
2011-04-26 17:15:57,258 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/edits of size
1049092 edits # 528 loaded in 0 seconds.
2011-04-26 17:15:57,265 ERROR org.apache.hadoop.hdfs.server.common.Storage:
Unable to save image for /var/lib/hadoop-0.20/cache/hadoop/dfs/name
java.io.IOException: saveLeases found path /hbase/base_tmp/.logs/
sv004.my.domain.com,60020,1302882411768/sv004.my.domain.com%3A60020.1302882412951
but no matching entry in namespace.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:5153)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1071)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1170)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:1118)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:347)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:321)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:267)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:461)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1202)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1211)
2011-04-26 17:15:57,273 WARN org.apache.hadoop.hdfs.server.common.Storage:
FSImage:processIOError: removing storage:
/var/lib/hadoop-0.20/cache/hadoop/dfs/name
2011-04-26 17:15:57,274 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Finished loading
FSImage in 1553 msecs




On Tue, Apr 26, 2011 at 5:19 PM, Jonathan Bender
<jo...@gmail.com>wrote:

> Wow, this is more intense than I thought...as soon as I load HBase again,
> my HDFS filesystem reverts back to an older snapshot essentially.  As in, I
> don't see any of the changes I had made since that time, in the hbase table
> or otherwise.
>
> I'm using CDH3 beta 4, which I believe stores its local hbase data in a
> different directory--not entirely sure where though.
>
> I'm not entirely sure what happened to mess this up, but it seems pretty
> serious.
>
> On Tue, Apr 26, 2011 at 5:07 PM, Himanshu Vashishtha <
> hvashish@cs.ualberta.ca> wrote:
>
>> Could it be the /tmp/hbase-<userID> directory that is playing the culprit.
>> just a wild guess though.
>>
>>
>> On Tue, Apr 26, 2011 at 5:56 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>>
>>> Unless HBase was running when you wiped that out (and even then), I
>>> don't see how this could happen. Could you match those blocks to the
>>> files using fsck and figure when the files were created and if they
>>> were part of the old install?
>>>
>>> Thx,
>>>
>>> J-D
>>>
>>> On Tue, Apr 26, 2011 at 4:53 PM, Jonathan Bender
>>> <jo...@gmail.com> wrote:
>>> > Hi all, I'm having a strange error which I can't exactly figure out.
>>> >
>>> > After wiping my /hbase HDFS directory to do a fresh install, I am
>>> getting
>>> > "MISSING BLOCKS" in this /hbase directory, which cause HDFS to start up
>>> in
>>> > safe mode.  This doesn't happen until I start my region servers, so I
>>> have a
>>> > feeling there is some kind of corrupted metadata that is being loaded
>>> from
>>> > these region servers.
>>> >
>>> > Is there a graceful way to wipe the HBase directory clean?  Any local
>>> > directories on the region servers /master / ZK server that I should be
>>> > wiping as well?
>>> >
>>> > Cheers,
>>> > Jon
>>> >
>>>
>>
>>
>

Re: HDFS reports corrupted blocks after HBase reinstall

Posted by Jonathan Bender <jo...@gmail.com>.

Wow, this is more intense than I thought...as soon as I load HBase again, my
HDFS filesystem reverts back to an older snapshot essentially.  As in, I
don't see any of the changes I had made since that time, in the hbase table
or otherwise.

I'm using CDH3 beta 4, which I believe stores its local hbase data in a
different directory--not entirely sure where though.

I'm not entirely sure what happened to mess this up, but it seems pretty
serious.

On Tue, Apr 26, 2011 at 5:07 PM, Himanshu Vashishtha <
hvashish@cs.ualberta.ca> wrote:

> Could it be the /tmp/hbase-<userID> directory that is playing the culprit.
> just a wild guess though.
>
>
> On Tue, Apr 26, 2011 at 5:56 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> Unless HBase was running when you wiped that out (and even then), I
>> don't see how this could happen. Could you match those blocks to the
>> files using fsck and figure when the files were created and if they
>> were part of the old install?
>>
>> Thx,
>>
>> J-D
>>
>> On Tue, Apr 26, 2011 at 4:53 PM, Jonathan Bender
>> <jo...@gmail.com> wrote:
>> > Hi all, I'm having a strange error which I can't exactly figure out.
>> >
>> > After wiping my /hbase HDFS directory to do a fresh install, I am
>> getting
>> > "MISSING BLOCKS" in this /hbase directory, which cause HDFS to start up
>> in
>> > safe mode.  This doesn't happen until I start my region servers, so I
>> have a
>> > feeling there is some kind of corrupted metadata that is being loaded
>> from
>> > these region servers.
>> >
>> > Is there a graceful way to wipe the HBase directory clean?  Any local
>> > directories on the region servers /master / ZK server that I should be
>> > wiping as well?
>> >
>> > Cheers,
>> > Jon
>> >
>>
>
>

Re: HDFS reports corrupted blocks after HBase reinstall

Posted by Himanshu Vashishtha <hv...@cs.ualberta.ca>.

Could it be the /tmp/hbase-<userID> directory that is playing the culprit.
just a wild guess though.

On Tue, Apr 26, 2011 at 5:56 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Unless HBase was running when you wiped that out (and even then), I
> don't see how this could happen. Could you match those blocks to the
> files using fsck and figure when the files were created and if they
> were part of the old install?
>
> Thx,
>
> J-D
>
> On Tue, Apr 26, 2011 at 4:53 PM, Jonathan Bender
> <jo...@gmail.com> wrote:
> > Hi all, I'm having a strange error which I can't exactly figure out.
> >
> > After wiping my /hbase HDFS directory to do a fresh install, I am getting
> > "MISSING BLOCKS" in this /hbase directory, which cause HDFS to start up
> in
> > safe mode.  This doesn't happen until I start my region servers, so I
> have a
> > feeling there is some kind of corrupted metadata that is being loaded
> from
> > these region servers.
> >
> > Is there a graceful way to wipe the HBase directory clean?  Any local
> > directories on the region servers /master / ZK server that I should be
> > wiping as well?
> >
> > Cheers,
> > Jon
> >
>

Re: HDFS reports corrupted blocks after HBase reinstall

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Unless HBase was running when you wiped that out (and even then), I
don't see how this could happen. Could you match those blocks to the
files using fsck and figure when the files were created and if they
were part of the old install?

Thx,

J-D

On Tue, Apr 26, 2011 at 4:53 PM, Jonathan Bender
<jo...@gmail.com> wrote:
> Hi all, I'm having a strange error which I can't exactly figure out.
>
> After wiping my /hbase HDFS directory to do a fresh install, I am getting
> "MISSING BLOCKS" in this /hbase directory, which cause HDFS to start up in
> safe mode.  This doesn't happen until I start my region servers, so I have a
> feeling there is some kind of corrupted metadata that is being loaded from
> these region servers.
>
> Is there a graceful way to wipe the HBase directory clean?  Any local
> directories on the region servers /master / ZK server that I should be
> wiping as well?
>
> Cheers,
> Jon
>