You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Andy Sautins <an...@returnpath.net> on 2011/01/27 03:22:36 UTC

EOF Exception trying to read recovered.edits file...

   We had a situation that has our HBase database in a bad state right now.  We re-started a number of nodes this afternoon and while HBase did keep running at least one of our tables does not seem to be serving all its regions.  What I'm seeing in the log is the below java.io.EOFException stacktrace while trying to read a file in the recovered.edits directory.  I looked around a bit and it seems like this might be related to HBASE-2933 which seems to say that if the master dies while trying to split a log it can leave invalid logs in recovered.edits.  That seems possible as it's possible that the master was one of the nodes that was re-started today.

   My question is, if this is indeed the case is there a safe way to recover from this situation where I am getting EOF exceptions applying recover on recovered.edits files?  My understanding is the master splits the logs and places them in the recovered.edits directory. I am wondering if I remove the files under the recovered.edits directory if the master would re-split the log file and recover properly or would I have data loss?

   We are currently running the cloudera distribution of HBase hbase-0.89.20100924.

   Any insights on the best way to recover would be much appreciated.

22eb51f162.: java.io.EOFException: hdfs://hdnn.dfs.returnpath.net:54310/user/hbase/emailProperties/9171dadec62d81105f0f6022eb51f162/recovered.edits/0000000000012154417, entryStart=4160964, pos=4161536, end=4161536, edit=1306
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:186)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:142)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:126)
        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1842)
        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1817)
        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1776)
        at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:342)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1503)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1468)
        at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1380)
        at java.lang.Thread.run(Unknown Source)



Re: EOF Exception trying to read recovered.edits file...

Posted by Stack <st...@duboce.net>.
You could move aside the problematic file to get going again.

That we are dying on EOFE for sure warrants more digging since we're
supposed to handle these on replay.   Will take a look in morning (I
seem to recall fixes around EOFEs in here before we released 0.90.0 --
need to dig them up).

St.Ack

On Wed, Jan 26, 2011 at 6:22 PM, Andy Sautins
<an...@returnpath.net> wrote:
>
>   We had a situation that has our HBase database in a bad state right now.  We re-started a number of nodes this afternoon and while HBase did keep running at least one of our tables does not seem to be serving all its regions.  What I'm seeing in the log is the below java.io.EOFException stacktrace while trying to read a file in the recovered.edits directory.  I looked around a bit and it seems like this might be related to HBASE-2933 which seems to say that if the master dies while trying to split a log it can leave invalid logs in recovered.edits.  That seems possible as it's possible that the master was one of the nodes that was re-started today.
>
>   My question is, if this is indeed the case is there a safe way to recover from this situation where I am getting EOF exceptions applying recover on recovered.edits files?  My understanding is the master splits the logs and places them in the recovered.edits directory. I am wondering if I remove the files under the recovered.edits directory if the master would re-split the log file and recover properly or would I have data loss?
>
>   We are currently running the cloudera distribution of HBase hbase-0.89.20100924.
>
>   Any insights on the best way to recover would be much appreciated.
>
> 22eb51f162.: java.io.EOFException: hdfs://hdnn.dfs.returnpath.net:54310/user/hbase/emailProperties/9171dadec62d81105f0f6022eb51f162/recovered.edits/0000000000012154417, entryStart=4160964, pos=4161536, end=4161536, edit=1306
>        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
>        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
>        at java.lang.reflect.Constructor.newInstance(Unknown Source)
>        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:186)
>        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:142)
>        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:126)
>        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1842)
>        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1817)
>        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1776)
>        at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:342)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1503)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1468)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1380)
>        at java.lang.Thread.run(Unknown Source)
>
>
>