You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by ramkrishna vasudevan <ra...@gmail.com> on 2013/05/06 13:26:07 UTC

HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Hi All

I am getting the following error when i run Trunk with hadop-2.0.3.
java.io.IOException: Failed read of int length 2
at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
at
org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
at
org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
at
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
at
org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
at
org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)

Am able to reproduce this with the cluster but not with the testcases even
when i run with 2.0.3.

Regards
Ram

Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Posted by Ted Yu <yu...@gmail.com>.

Can you try Hadoop 2.0.4-alpha to see if the problem persists ?

Thanks

On May 6, 2013, at 4:26 AM, ramkrishna vasudevan <ra...@gmail.com> wrote:

> Hi All
> 
> I am getting the following error when i run Trunk with hadop-2.0.3.
> java.io.IOException: Failed read of int length 2
> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
> at
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> at
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> 
> Am able to reproduce this with the cluster but not with the testcases even
> when i run with 2.0.3.
> 
> Regards
> Ram

Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Posted by Jimmy Xiang <jx...@cloudera.com>.

I got the same error today and filed HBASE-8511.  The log splitting is
stuck there for a while no matter how many retries.

Thanks,
Jimmy


On Wed, May 8, 2013 at 12:17 PM, Anoop John <an...@gmail.com> wrote:

> I see we have FSDataInputStream in ProtobufLogReader.. We will test with
> the patch tomorow and report back.
>
> -Anoop-
>
> On Thu, May 9, 2013 at 12:30 AM, Anoop John <an...@gmail.com> wrote:
>
> >  >However this doesn't explain why sequencefile has no such problems
> >
> > What I can see is we use FSDataInputStream in case of
> > SequenecFileLogReader (Not FSInputStream) and we use FSDataInputStream
> > #readInt() (WALEdit#read...).  This inturn reads the int from stream by
> > calling in.read() 4 times making sure we read full int (If it is not EOF
> > issue)
> >
> > May be we can try out this way and test tomorow Ram.
> >
> > BTW IOUtils.readFully() is not returning the actual # bytes read it
> looks.
> >
> > -Anoop-
> >
> >  On Thu, May 9, 2013 at 12:07 AM, Sergey Shelukhin <
> sergey@hortonworks.com
> > > wrote:
> >
> >> 589824 is divisible by 4096, default buffer size. Que groundless
> >> suspicions. However this doesn't explain why sequencefile has no such
> >> problems. Do you have any more numbers with this?
> >> I am in meetings now, let me try readFully after I come back from lunch
> >>
> >> On Wed, May 8, 2013 at 11:23 AM, ramkrishna vasudevan <
> >>  ramkrishna.s.vasudevan@gmail.com> wrote:
> >>
> >> > >The length is from read call. This looks pretty suspicious, if the
> >> stream
> >> > is not EOF why would it return less bytes?
> >> >
> >> > Infact this happened for an entry in the middle of the HLog.  Write
> some
> >> > data using LoadTestTool.  Try to use the HLog tool to read the HLog
> file
> >> > you will end up in this issue. (most of the times).
> >> >
> >> > Regards
> >> > Ram
> >> >
> >> >
> >> >
> >> > On Wed, May 8, 2013 at 11:48 PM, Sergey Shelukhin <
> >> sergey@hortonworks.com
> >> > >wrote:
> >> >
> >> > > if (length != intBytes.length) throw new IOException("Failed read of
> >> int
> >> > > length " + length);
> >> > > The length is from read call. This looks pretty suspicious, if the
> >> stream
> >> > > is not EOF why would it return less bytes? I will try to repro
> today.
> >> > >
> >> > > On Wed, May 8, 2013 at 5:46 AM, ramkrishna vasudevan <
> >> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > >
> >> > > > On further debugging found that this issue happens with
> >> ProtoBufWriter
> >> > > and
> >> > > > not with sequenceFileWriter.(atleast we could not reproduce it
> with
> >> > > > different runs)
> >> > > >
> >> > > > We can see that the HLog has more data in it but while reading one
> >> of
> >> > the
> >> > > > lines in the HLog this error happens.  So pretty much sure that it
> >> is
> >> > not
> >> > > > EOF.
> >> > > > Verified DFS logs but could not find any exceptions out there too.
> >> > > >
> >> > > > We will try to figure out more on this tomorrow.
> >> > > >
> >> > > > Regards
> >> > > > Ram
> >> > > >
> >> > > >
> >> > > > On Wed, May 8, 2013 at 11:34 AM, ramkrishna vasudevan <
> >> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > > >
> >> > > > > Ok so i tried this out with hadoop 2.0.4 and also with Sergey's
> >> > patch.
> >> > > > >  The issue is reproducible in all version of hadoop but not
> >> always.
> >> > > > > I am able to get the errors like this
> >> > > > >
> >> > > > > 2013-05-07 17:11:08,827 WARN  [SplitLogWorker-ram.sh.intel.com<
> http://splitlogworker-ram.sh.intel.com/>
> >> > > > ,60020,1367961009182]
> >> > > > > org.apache.hadoop.hbase.regionserver.SplitLogWorker: log
> >> splitting of
> >> > > > .logs/
> >> > > > > ram.sh.intel.com,60020,1367960957620-splitting/ram.sh.intel.com
> >> > > > %2C60020%2C1367960957620.1367960993389
> >> > > > > failed, returning error
> >> > > > > java.io.IOException: Error  while reading 1 WAL KVs; started
> >> reading
> >> > at
> >> > > > > 589822 and read up to 589824
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:162)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> >> > > > > at java.lang.Thread.run(Thread.java:662)
> >> > > > > Caused by: java.io.IOException: Failed read of int length 2
> >> > > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> >> > > > >  at
> >> > > >
> >> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:143)
> >> > > > > ... 10 more
> >> > > > >
> >> > > > >
> >> > > > > and sometimes
> >> > > > > java.io.IOException: Failed read of int length 1
> >> > > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> >> > > > > at
> >> > >
> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2837)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:2755)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:664)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:569)
> >> > > > >  at
> >> > > > >
> >> > >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:540)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4095)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4066)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4016)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3967)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:448)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:136)
> >> > > > > at
> >> > > >
> >> >
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
> >> > > > >  at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >> > > > > at
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >> > > > >
> >> > > > >
> >> > > > > I will work on this today and find out the root cause of it.
> >> > > > >
> >> > > > > Regards
> >> > > > > Ram
> >> > > > >
> >> > > > >
> >> > > > > On Tue, May 7, 2013 at 8:12 AM, ramkrishna vasudevan <
> >> > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > > > >
> >> > > > >> I too think it may be EOF.  But i did not debug it fully.
>  Today
> >> let
> >> > > me
> >> > > > >> check and try applying your patch.
> >> > > > >>
> >> > > > >> Regards
> >> > > > >> Ram
> >> > > > >>
> >> > > > >>
> >> > > > >> On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <
> >> > > > sergey@hortonworks.com>wrote:
> >> > > > >>
> >> > > > >>> Please take a look at the patch in
> >> > > > >>> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498
> >...
> >> > > > >>> this should make it possible to get more details.
> >> > > > >>>
> >> > > > >>> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <
> >> > > > >>> sergey@hortonworks.com>wrote:
> >> > > > >>>
> >> > > > >>> > 1) Is there a cause- stack?
> >> > > > >>> > 2) Can you ascertain if WAL is truncated at that place?
> >> Exception
> >> > > > type
> >> > > > >>> > might have changed/exception might have expanded between
> >> Hadoop 1
> >> > > and
> >> > > > >>> 2;
> >> > > > >>> > WAL replay should ignore EOF, so if this is a EOF problem
> then
> >> > this
> >> > > > >>> would
> >> > > > >>> > be easy to correct, if it's something more serious then it's
> >> bad.
> >> > > > >>> > I will add some logging/catching around to add cause (if
> >> missing)
> >> > > and
> >> > > > >>> > useful logs.
> >> > > > >>> >
> >> > > > >>> >
> >> > > > >>> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
> >> > > > >>> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >> > > > >>> >
> >> > > > >>> >> Hi All
> >> > > > >>> >>
> >> > > > >>> >> I am getting the following error when i run Trunk with
> >> > > hadop-2.0.3.
> >> > > > >>> >> java.io.IOException: Failed read of int length 2
> >> > > > >>> >> at
> >> org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> >> > > > >>> >> at
> >> > > > >>>
> >> > >
> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> >> > > > >>> >> at
> >> > > > >>> >>
> >> > > > >>> >>
> >> > > > >>>
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> >> > > > >>> >>
> >> > > > >>> >> Am able to reproduce this with the cluster but not with the
> >> > > > testcases
> >> > > > >>> even
> >> > > > >>> >> when i run with 2.0.3.
> >> > > > >>> >>
> >> > > > >>> >> Regards
> >> > > > >>> >> Ram
> >> > > > >>> >>
> >> > > > >>> >
> >> > > > >>> >
> >> > > > >>>
> >> > > > >>
> >> > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Posted by Anoop John <an...@gmail.com>.

I see we have FSDataInputStream in ProtobufLogReader.. We will test with
the patch tomorow and report back.

-Anoop-

On Thu, May 9, 2013 at 12:30 AM, Anoop John <an...@gmail.com> wrote:

>  >However this doesn't explain why sequencefile has no such problems
>
> What I can see is we use FSDataInputStream in case of
> SequenecFileLogReader (Not FSInputStream) and we use FSDataInputStream
> #readInt() (WALEdit#read...).  This inturn reads the int from stream by
> calling in.read() 4 times making sure we read full int (If it is not EOF
> issue)
>
> May be we can try out this way and test tomorow Ram.
>
> BTW IOUtils.readFully() is not returning the actual # bytes read it looks.
>
> -Anoop-
>
>  On Thu, May 9, 2013 at 12:07 AM, Sergey Shelukhin <sergey@hortonworks.com
> > wrote:
>
>> 589824 is divisible by 4096, default buffer size. Que groundless
>> suspicions. However this doesn't explain why sequencefile has no such
>> problems. Do you have any more numbers with this?
>> I am in meetings now, let me try readFully after I come back from lunch
>>
>> On Wed, May 8, 2013 at 11:23 AM, ramkrishna vasudevan <
>>  ramkrishna.s.vasudevan@gmail.com> wrote:
>>
>> > >The length is from read call. This looks pretty suspicious, if the
>> stream
>> > is not EOF why would it return less bytes?
>> >
>> > Infact this happened for an entry in the middle of the HLog.  Write some
>> > data using LoadTestTool.  Try to use the HLog tool to read the HLog file
>> > you will end up in this issue. (most of the times).
>> >
>> > Regards
>> > Ram
>> >
>> >
>> >
>> > On Wed, May 8, 2013 at 11:48 PM, Sergey Shelukhin <
>> sergey@hortonworks.com
>> > >wrote:
>> >
>> > > if (length != intBytes.length) throw new IOException("Failed read of
>> int
>> > > length " + length);
>> > > The length is from read call. This looks pretty suspicious, if the
>> stream
>> > > is not EOF why would it return less bytes? I will try to repro today.
>> > >
>> > > On Wed, May 8, 2013 at 5:46 AM, ramkrishna vasudevan <
>> > > ramkrishna.s.vasudevan@gmail.com> wrote:
>> > >
>> > > > On further debugging found that this issue happens with
>> ProtoBufWriter
>> > > and
>> > > > not with sequenceFileWriter.(atleast we could not reproduce it with
>> > > > different runs)
>> > > >
>> > > > We can see that the HLog has more data in it but while reading one
>> of
>> > the
>> > > > lines in the HLog this error happens.  So pretty much sure that it
>> is
>> > not
>> > > > EOF.
>> > > > Verified DFS logs but could not find any exceptions out there too.
>> > > >
>> > > > We will try to figure out more on this tomorrow.
>> > > >
>> > > > Regards
>> > > > Ram
>> > > >
>> > > >
>> > > > On Wed, May 8, 2013 at 11:34 AM, ramkrishna vasudevan <
>> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
>> > > >
>> > > > > Ok so i tried this out with hadoop 2.0.4 and also with Sergey's
>> > patch.
>> > > > >  The issue is reproducible in all version of hadoop but not
>> always.
>> > > > > I am able to get the errors like this
>> > > > >
>> > > > > 2013-05-07 17:11:08,827 WARN  [SplitLogWorker-ram.sh.intel.com<http://splitlogworker-ram.sh.intel.com/>
>> > > > ,60020,1367961009182]
>> > > > > org.apache.hadoop.hbase.regionserver.SplitLogWorker: log
>> splitting of
>> > > > .logs/
>> > > > > ram.sh.intel.com,60020,1367960957620-splitting/ram.sh.intel.com
>> > > > %2C60020%2C1367960957620.1367960993389
>> > > > > failed, returning error
>> > > > > java.io.IOException: Error  while reading 1 WAL KVs; started
>> reading
>> > at
>> > > > > 589822 and read up to 589824
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:162)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
>> > > > > at java.lang.Thread.run(Thread.java:662)
>> > > > > Caused by: java.io.IOException: Failed read of int length 2
>> > > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
>> > > > >  at
>> > > >
>> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:143)
>> > > > > ... 10 more
>> > > > >
>> > > > >
>> > > > > and sometimes
>> > > > > java.io.IOException: Failed read of int length 1
>> > > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
>> > > > > at
>> > > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2837)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:2755)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:664)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:569)
>> > > > >  at
>> > > > >
>> > >
>> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:540)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4095)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4066)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4016)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3967)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:448)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:136)
>> > > > > at
>> > > >
>> > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
>> > > > >  at
>> > > > >
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> > > > > at
>> > > > >
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> > > > >
>> > > > >
>> > > > > I will work on this today and find out the root cause of it.
>> > > > >
>> > > > > Regards
>> > > > > Ram
>> > > > >
>> > > > >
>> > > > > On Tue, May 7, 2013 at 8:12 AM, ramkrishna vasudevan <
>> > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
>> > > > >
>> > > > >> I too think it may be EOF.  But i did not debug it fully.  Today
>> let
>> > > me
>> > > > >> check and try applying your patch.
>> > > > >>
>> > > > >> Regards
>> > > > >> Ram
>> > > > >>
>> > > > >>
>> > > > >> On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <
>> > > > sergey@hortonworks.com>wrote:
>> > > > >>
>> > > > >>> Please take a look at the patch in
>> > > > >>> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498>...
>> > > > >>> this should make it possible to get more details.
>> > > > >>>
>> > > > >>> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <
>> > > > >>> sergey@hortonworks.com>wrote:
>> > > > >>>
>> > > > >>> > 1) Is there a cause- stack?
>> > > > >>> > 2) Can you ascertain if WAL is truncated at that place?
>> Exception
>> > > > type
>> > > > >>> > might have changed/exception might have expanded between
>> Hadoop 1
>> > > and
>> > > > >>> 2;
>> > > > >>> > WAL replay should ignore EOF, so if this is a EOF problem then
>> > this
>> > > > >>> would
>> > > > >>> > be easy to correct, if it's something more serious then it's
>> bad.
>> > > > >>> > I will add some logging/catching around to add cause (if
>> missing)
>> > > and
>> > > > >>> > useful logs.
>> > > > >>> >
>> > > > >>> >
>> > > > >>> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
>> > > > >>> > ramkrishna.s.vasudevan@gmail.com> wrote:
>> > > > >>> >
>> > > > >>> >> Hi All
>> > > > >>> >>
>> > > > >>> >> I am getting the following error when i run Trunk with
>> > > hadop-2.0.3.
>> > > > >>> >> java.io.IOException: Failed read of int length 2
>> > > > >>> >> at
>> org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
>> > > > >>> >> at
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
>> > > > >>> >> at
>> > > > >>>
>> > > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
>> > > > >>> >> at
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
>> > > > >>> >> at
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
>> > > > >>> >> at
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
>> > > > >>> >> at
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>> > > > >>> >> at
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
>> > > > >>> >> at
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
>> > > > >>> >> at
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
>> > > > >>> >> at
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
>> > > > >>> >> at
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
>> > > > >>> >> at
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
>> > > > >>> >> at
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>>
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
>> > > > >>> >>
>> > > > >>> >> Am able to reproduce this with the cluster but not with the
>> > > > testcases
>> > > > >>> even
>> > > > >>> >> when i run with 2.0.3.
>> > > > >>> >>
>> > > > >>> >> Regards
>> > > > >>> >> Ram
>> > > > >>> >>
>> > > > >>> >
>> > > > >>> >
>> > > > >>>
>> > > > >>
>> > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Posted by Anoop John <an...@gmail.com>.

>However this doesn't explain why sequencefile has no such problems

What I can see is we use FSDataInputStream in case of SequenecFileLogReader
(Not FSInputStream) and we use FSDataInputStream #readInt() (WALEdit#read...).
This inturn reads the int from stream by calling in.read() 4 times making
sure we read full int (If it is not EOF issue)

May be we can try out this way and test tomorow Ram.

BTW IOUtils.readFully() is not returning the actual # bytes read it looks.

-Anoop-

On Thu, May 9, 2013 at 12:07 AM, Sergey Shelukhin <se...@hortonworks.com>wrote:

> 589824 is divisible by 4096, default buffer size. Que groundless
> suspicions. However this doesn't explain why sequencefile has no such
> problems. Do you have any more numbers with this?
> I am in meetings now, let me try readFully after I come back from lunch
>
> On Wed, May 8, 2013 at 11:23 AM, ramkrishna vasudevan <
>  ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > >The length is from read call. This looks pretty suspicious, if the
> stream
> > is not EOF why would it return less bytes?
> >
> > Infact this happened for an entry in the middle of the HLog.  Write some
> > data using LoadTestTool.  Try to use the HLog tool to read the HLog file
> > you will end up in this issue. (most of the times).
> >
> > Regards
> > Ram
> >
> >
> >
> > On Wed, May 8, 2013 at 11:48 PM, Sergey Shelukhin <
> sergey@hortonworks.com
> > >wrote:
> >
> > > if (length != intBytes.length) throw new IOException("Failed read of
> int
> > > length " + length);
> > > The length is from read call. This looks pretty suspicious, if the
> stream
> > > is not EOF why would it return less bytes? I will try to repro today.
> > >
> > > On Wed, May 8, 2013 at 5:46 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > > On further debugging found that this issue happens with
> ProtoBufWriter
> > > and
> > > > not with sequenceFileWriter.(atleast we could not reproduce it with
> > > > different runs)
> > > >
> > > > We can see that the HLog has more data in it but while reading one of
> > the
> > > > lines in the HLog this error happens.  So pretty much sure that it is
> > not
> > > > EOF.
> > > > Verified DFS logs but could not find any exceptions out there too.
> > > >
> > > > We will try to figure out more on this tomorrow.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > >
> > > > On Wed, May 8, 2013 at 11:34 AM, ramkrishna vasudevan <
> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > >
> > > > > Ok so i tried this out with hadoop 2.0.4 and also with Sergey's
> > patch.
> > > > >  The issue is reproducible in all version of hadoop but not always.
> > > > > I am able to get the errors like this
> > > > >
> > > > > 2013-05-07 17:11:08,827 WARN  [SplitLogWorker-ram.sh.intel.com<http://splitlogworker-ram.sh.intel.com/>
> > > > ,60020,1367961009182]
> > > > > org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting
> of
> > > > .logs/
> > > > > ram.sh.intel.com,60020,1367960957620-splitting/ram.sh.intel.com
> > > > %2C60020%2C1367960957620.1367960993389
> > > > > failed, returning error
> > > > > java.io.IOException: Error  while reading 1 WAL KVs; started
> reading
> > at
> > > > > 589822 and read up to 589824
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:162)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> > > > > at java.lang.Thread.run(Thread.java:662)
> > > > > Caused by: java.io.IOException: Failed read of int length 2
> > > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > > > >  at
> > > >
> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:143)
> > > > > ... 10 more
> > > > >
> > > > >
> > > > > and sometimes
> > > > > java.io.IOException: Failed read of int length 1
> > > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > > > > at
> > > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2837)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:2755)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:664)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:569)
> > > > >  at
> > > > >
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:540)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4095)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4066)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4016)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3967)
> > > > > at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:448)
> > > > >  at
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:136)
> > > > > at
> > > >
> > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
> > > > >  at
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > > > at
> > > > >
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > > >
> > > > >
> > > > > I will work on this today and find out the root cause of it.
> > > > >
> > > > > Regards
> > > > > Ram
> > > > >
> > > > >
> > > > > On Tue, May 7, 2013 at 8:12 AM, ramkrishna vasudevan <
> > > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > >
> > > > >> I too think it may be EOF.  But i did not debug it fully.  Today
> let
> > > me
> > > > >> check and try applying your patch.
> > > > >>
> > > > >> Regards
> > > > >> Ram
> > > > >>
> > > > >>
> > > > >> On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <
> > > > sergey@hortonworks.com>wrote:
> > > > >>
> > > > >>> Please take a look at the patch in
> > > > >>> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498>...
> > > > >>> this should make it possible to get more details.
> > > > >>>
> > > > >>> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <
> > > > >>> sergey@hortonworks.com>wrote:
> > > > >>>
> > > > >>> > 1) Is there a cause- stack?
> > > > >>> > 2) Can you ascertain if WAL is truncated at that place?
> Exception
> > > > type
> > > > >>> > might have changed/exception might have expanded between
> Hadoop 1
> > > and
> > > > >>> 2;
> > > > >>> > WAL replay should ignore EOF, so if this is a EOF problem then
> > this
> > > > >>> would
> > > > >>> > be easy to correct, if it's something more serious then it's
> bad.
> > > > >>> > I will add some logging/catching around to add cause (if
> missing)
> > > and
> > > > >>> > useful logs.
> > > > >>> >
> > > > >>> >
> > > > >>> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
> > > > >>> > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > > >>> >
> > > > >>> >> Hi All
> > > > >>> >>
> > > > >>> >> I am getting the following error when i run Trunk with
> > > hadop-2.0.3.
> > > > >>> >> java.io.IOException: Failed read of int length 2
> > > > >>> >> at
> org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
> > > > >>> >> at
> > > > >>> >>
> > > > >>> >>
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > > > >>> >> at
> > > > >>>
> > > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> > > > >>> >> at
> > > > >>> >>
> > > > >>> >>
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > > > >>> >> at
> > > > >>> >>
> > > > >>> >>
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> > > > >>> >> at
> > > > >>> >>
> > > > >>> >>
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > > > >>> >> at
> > > > >>> >>
> > > > >>> >>
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> > > > >>> >> at
> > > > >>> >>
> > > > >>> >>
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> > > > >>> >> at
> > > > >>> >>
> > > > >>> >>
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> > > > >>> >> at
> > > > >>> >>
> > > > >>> >>
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> > > > >>> >> at
> > > > >>> >>
> > > > >>> >>
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> > > > >>> >> at
> > > > >>> >>
> > > > >>> >>
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> > > > >>> >> at
> > > > >>> >>
> > > > >>> >>
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> > > > >>> >> at
> > > > >>> >>
> > > > >>> >>
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> > > > >>> >>
> > > > >>> >> Am able to reproduce this with the cluster but not with the
> > > > testcases
> > > > >>> even
> > > > >>> >> when i run with 2.0.3.
> > > > >>> >>
> > > > >>> >> Regards
> > > > >>> >> Ram
> > > > >>> >>
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Posted by Sergey Shelukhin <se...@hortonworks.com>.

589824 is divisible by 4096, default buffer size. Que groundless
suspicions. However this doesn't explain why sequencefile has no such
problems. Do you have any more numbers with this?
I am in meetings now, let me try readFully after I come back from lunch

On Wed, May 8, 2013 at 11:23 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> >The length is from read call. This looks pretty suspicious, if the stream
> is not EOF why would it return less bytes?
>
> Infact this happened for an entry in the middle of the HLog.  Write some
> data using LoadTestTool.  Try to use the HLog tool to read the HLog file
> you will end up in this issue. (most of the times).
>
> Regards
> Ram
>
>
>
> On Wed, May 8, 2013 at 11:48 PM, Sergey Shelukhin <sergey@hortonworks.com
> >wrote:
>
> > if (length != intBytes.length) throw new IOException("Failed read of int
> > length " + length);
> > The length is from read call. This looks pretty suspicious, if the stream
> > is not EOF why would it return less bytes? I will try to repro today.
> >
> > On Wed, May 8, 2013 at 5:46 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > On further debugging found that this issue happens with ProtoBufWriter
> > and
> > > not with sequenceFileWriter.(atleast we could not reproduce it with
> > > different runs)
> > >
> > > We can see that the HLog has more data in it but while reading one of
> the
> > > lines in the HLog this error happens.  So pretty much sure that it is
> not
> > > EOF.
> > > Verified DFS logs but could not find any exceptions out there too.
> > >
> > > We will try to figure out more on this tomorrow.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > > On Wed, May 8, 2013 at 11:34 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > > Ok so i tried this out with hadoop 2.0.4 and also with Sergey's
> patch.
> > > >  The issue is reproducible in all version of hadoop but not always.
> > > > I am able to get the errors like this
> > > >
> > > > 2013-05-07 17:11:08,827 WARN  [SplitLogWorker-ram.sh.intel.com
> > > ,60020,1367961009182]
> > > > org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
> > > .logs/
> > > > ram.sh.intel.com,60020,1367960957620-splitting/ram.sh.intel.com
> > > %2C60020%2C1367960957620.1367960993389
> > > > failed, returning error
> > > > java.io.IOException: Error  while reading 1 WAL KVs; started reading
> at
> > > > 589822 and read up to 589824
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:162)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> > > > at java.lang.Thread.run(Thread.java:662)
> > > > Caused by: java.io.IOException: Failed read of int length 2
> > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > > >  at
> > > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:143)
> > > > ... 10 more
> > > >
> > > >
> > > > and sometimes
> > > > java.io.IOException: Failed read of int length 1
> > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > > > at
> > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2837)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:2755)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:664)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:569)
> > > >  at
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:540)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4095)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4066)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4016)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3967)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:448)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:136)
> > > > at
> > >
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
> > > >  at
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > > at
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > >
> > > >
> > > > I will work on this today and find out the root cause of it.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > >
> > > > On Tue, May 7, 2013 at 8:12 AM, ramkrishna vasudevan <
> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > >
> > > >> I too think it may be EOF.  But i did not debug it fully.  Today let
> > me
> > > >> check and try applying your patch.
> > > >>
> > > >> Regards
> > > >> Ram
> > > >>
> > > >>
> > > >> On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <
> > > sergey@hortonworks.com>wrote:
> > > >>
> > > >>> Please take a look at the patch in
> > > >>> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498>...
> > > >>> this should make it possible to get more details.
> > > >>>
> > > >>> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <
> > > >>> sergey@hortonworks.com>wrote:
> > > >>>
> > > >>> > 1) Is there a cause- stack?
> > > >>> > 2) Can you ascertain if WAL is truncated at that place? Exception
> > > type
> > > >>> > might have changed/exception might have expanded between Hadoop 1
> > and
> > > >>> 2;
> > > >>> > WAL replay should ignore EOF, so if this is a EOF problem then
> this
> > > >>> would
> > > >>> > be easy to correct, if it's something more serious then it's bad.
> > > >>> > I will add some logging/catching around to add cause (if missing)
> > and
> > > >>> > useful logs.
> > > >>> >
> > > >>> >
> > > >>> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
> > > >>> > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > >>> >
> > > >>> >> Hi All
> > > >>> >>
> > > >>> >> I am getting the following error when i run Trunk with
> > hadop-2.0.3.
> > > >>> >> java.io.IOException: Failed read of int length 2
> > > >>> >> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > > >>> >> at
> > > >>>
> > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> > > >>> >>
> > > >>> >> Am able to reproduce this with the cluster but not with the
> > > testcases
> > > >>> even
> > > >>> >> when i run with 2.0.3.
> > > >>> >>
> > > >>> >> Regards
> > > >>> >> Ram
> > > >>> >>
> > > >>> >
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Posted by Anoop John <an...@gmail.com>.

Ram,
  For you the issue was not happening with lesser data right?

-Anoop-

On Thu, May 9, 2013 at 12:04 AM, Anoop John <an...@gmail.com> wrote:

> The issue was not coming for the same test with same ammount of data with
> SequenceFileLogWritter & SFLR
>
> Can the FSInputStream#read(byte[]) can read lesser bytes only even when it
> is not EOF?  I can see we use   IOUtils.*readFully*(in, byte[], int, int);
> What is the difference of this from the other?  Do there be a diff when we
> read at file blocks boundary? (For these 2)
>
> -Anoop-
>
>  On Wed, May 8, 2013 at 11:53 PM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
>> >The length is from read call. This looks pretty suspicious, if the stream
>> is not EOF why would it return less bytes?
>>
>> Infact this happened for an entry in the middle of the HLog.  Write some
>> data using LoadTestTool.  Try to use the HLog tool to read the HLog file
>> you will end up in this issue. (most of the times).
>>
>> Regards
>> Ram
>>
>>
>>
>> On Wed, May 8, 2013 at 11:48 PM, Sergey Shelukhin <sergey@hortonworks.com
>> >wrote:
>>
>> > if (length != intBytes.length) throw new IOException("Failed read of int
>> > length " + length);
>> > The length is from read call. This looks pretty suspicious, if the
>> stream
>> > is not EOF why would it return less bytes? I will try to repro today.
>> >
>> > On Wed, May 8, 2013 at 5:46 AM, ramkrishna vasudevan <
>> > ramkrishna.s.vasudevan@gmail.com> wrote:
>> >
>> > > On further debugging found that this issue happens with ProtoBufWriter
>> > and
>> > > not with sequenceFileWriter.(atleast we could not reproduce it with
>> > > different runs)
>> > >
>> > > We can see that the HLog has more data in it but while reading one of
>> the
>> > > lines in the HLog this error happens.  So pretty much sure that it is
>> not
>> > > EOF.
>> > > Verified DFS logs but could not find any exceptions out there too.
>> > >
>> > > We will try to figure out more on this tomorrow.
>> > >
>> > > Regards
>> > > Ram
>> > >
>> > >
>> > > On Wed, May 8, 2013 at 11:34 AM, ramkrishna vasudevan <
>> > > ramkrishna.s.vasudevan@gmail.com> wrote:
>> > >
>> > > > Ok so i tried this out with hadoop 2.0.4 and also with Sergey's
>> patch.
>> > > >  The issue is reproducible in all version of hadoop but not always.
>> > > > I am able to get the errors like this
>> > > >
>> > > > 2013-05-07 17:11:08,827 WARN  [SplitLogWorker-ram.sh.intel.com<http://splitlogworker-ram.sh.intel.com/>
>> > > ,60020,1367961009182]
>> > > > org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting
>> of
>> > > .logs/
>> > > > ram.sh.intel.com,60020,1367960957620-splitting/ram.sh.intel.com
>> > > %2C60020%2C1367960957620.1367960993389
>> > > > failed, returning error
>> > > > java.io.IOException: Error  while reading 1 WAL KVs; started
>> reading at
>> > > > 589822 and read up to 589824
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:162)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
>> > > > at java.lang.Thread.run(Thread.java:662)
>> > > > Caused by: java.io.IOException: Failed read of int length 2
>> > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
>> > > >  at
>> > > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:143)
>> > > > ... 10 more
>> > > >
>> > > >
>> > > > and sometimes
>> > > > java.io.IOException: Failed read of int length 1
>> > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
>> > > > at
>> > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2837)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:2755)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:664)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:569)
>> > > >  at
>> > > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:540)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4095)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4066)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4016)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3967)
>> > > > at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:448)
>> > > >  at
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:136)
>> > > > at
>> > >
>> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
>> > > >  at
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> > > > at
>> > > >
>> > >
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> > > >
>> > > >
>> > > > I will work on this today and find out the root cause of it.
>> > > >
>> > > > Regards
>> > > > Ram
>> > > >
>> > > >
>> > > > On Tue, May 7, 2013 at 8:12 AM, ramkrishna vasudevan <
>> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
>> > > >
>> > > >> I too think it may be EOF.  But i did not debug it fully.  Today
>> let
>> > me
>> > > >> check and try applying your patch.
>> > > >>
>> > > >> Regards
>> > > >> Ram
>> > > >>
>> > > >>
>> > > >> On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <
>> > > sergey@hortonworks.com>wrote:
>> > > >>
>> > > >>> Please take a look at the patch in
>> > > >>> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498>...
>> > > >>> this should make it possible to get more details.
>> > > >>>
>> > > >>> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <
>> > > >>> sergey@hortonworks.com>wrote:
>> > > >>>
>> > > >>> > 1) Is there a cause- stack?
>> > > >>> > 2) Can you ascertain if WAL is truncated at that place?
>> Exception
>> > > type
>> > > >>> > might have changed/exception might have expanded between Hadoop
>> 1
>> > and
>> > > >>> 2;
>> > > >>> > WAL replay should ignore EOF, so if this is a EOF problem then
>> this
>> > > >>> would
>> > > >>> > be easy to correct, if it's something more serious then it's
>> bad.
>> > > >>> > I will add some logging/catching around to add cause (if
>> missing)
>> > and
>> > > >>> > useful logs.
>> > > >>> >
>> > > >>> >
>> > > >>> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
>> > > >>> > ramkrishna.s.vasudevan@gmail.com> wrote:
>> > > >>> >
>> > > >>> >> Hi All
>> > > >>> >>
>> > > >>> >> I am getting the following error when i run Trunk with
>> > hadop-2.0.3.
>> > > >>> >> java.io.IOException: Failed read of int length 2
>> > > >>> >> at
>> org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
>> > > >>> >> at
>> > > >>>
>> > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
>> > > >>> >> at
>> > > >>> >>
>> > > >>> >>
>> > > >>>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
>> > > >>> >>
>> > > >>> >> Am able to reproduce this with the cluster but not with the
>> > > testcases
>> > > >>> even
>> > > >>> >> when i run with 2.0.3.
>> > > >>> >>
>> > > >>> >> Regards
>> > > >>> >> Ram
>> > > >>> >>
>> > > >>> >
>> > > >>> >
>> > > >>>
>> > > >>
>> > > >>
>> > > >
>> > >
>> >
>>
>
>

Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Posted by Anoop John <an...@gmail.com>.

The issue was not coming for the same test with same ammount of data with
SequenceFileLogWritter & SFLR

Can the FSInputStream#read(byte[]) can read lesser bytes only even when it
is not EOF?  I can see we use   IOUtils.*readFully*(in, byte[], int, int);
What is the difference of this from the other?  Do there be a diff when we
read at file blocks boundary? (For these 2)

-Anoop-

On Wed, May 8, 2013 at 11:53 PM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> >The length is from read call. This looks pretty suspicious, if the stream
> is not EOF why would it return less bytes?
>
> Infact this happened for an entry in the middle of the HLog.  Write some
> data using LoadTestTool.  Try to use the HLog tool to read the HLog file
> you will end up in this issue. (most of the times).
>
> Regards
> Ram
>
>
>
> On Wed, May 8, 2013 at 11:48 PM, Sergey Shelukhin <sergey@hortonworks.com
> >wrote:
>
> > if (length != intBytes.length) throw new IOException("Failed read of int
> > length " + length);
> > The length is from read call. This looks pretty suspicious, if the stream
> > is not EOF why would it return less bytes? I will try to repro today.
> >
> > On Wed, May 8, 2013 at 5:46 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > On further debugging found that this issue happens with ProtoBufWriter
> > and
> > > not with sequenceFileWriter.(atleast we could not reproduce it with
> > > different runs)
> > >
> > > We can see that the HLog has more data in it but while reading one of
> the
> > > lines in the HLog this error happens.  So pretty much sure that it is
> not
> > > EOF.
> > > Verified DFS logs but could not find any exceptions out there too.
> > >
> > > We will try to figure out more on this tomorrow.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > > On Wed, May 8, 2013 at 11:34 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > > > Ok so i tried this out with hadoop 2.0.4 and also with Sergey's
> patch.
> > > >  The issue is reproducible in all version of hadoop but not always.
> > > > I am able to get the errors like this
> > > >
> > > > 2013-05-07 17:11:08,827 WARN  [SplitLogWorker-ram.sh.intel.com<http://splitlogworker-ram.sh.intel.com/>
> > > ,60020,1367961009182]
> > > > org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
> > > .logs/
> > > > ram.sh.intel.com,60020,1367960957620-splitting/ram.sh.intel.com
> > > %2C60020%2C1367960957620.1367960993389
> > > > failed, returning error
> > > > java.io.IOException: Error  while reading 1 WAL KVs; started reading
> at
> > > > 589822 and read up to 589824
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:162)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> > > > at java.lang.Thread.run(Thread.java:662)
> > > > Caused by: java.io.IOException: Failed read of int length 2
> > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > > >  at
> > > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:143)
> > > > ... 10 more
> > > >
> > > >
> > > > and sometimes
> > > > java.io.IOException: Failed read of int length 1
> > > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > > > at
> > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2837)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:2755)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:664)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:569)
> > > >  at
> > > >
> > org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:540)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4095)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4066)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4016)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3967)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:448)
> > > >  at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:136)
> > > > at
> > >
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
> > > >  at
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > > at
> > > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > > >
> > > >
> > > > I will work on this today and find out the root cause of it.
> > > >
> > > > Regards
> > > > Ram
> > > >
> > > >
> > > > On Tue, May 7, 2013 at 8:12 AM, ramkrishna vasudevan <
> > > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > >
> > > >> I too think it may be EOF.  But i did not debug it fully.  Today let
> > me
> > > >> check and try applying your patch.
> > > >>
> > > >> Regards
> > > >> Ram
> > > >>
> > > >>
> > > >> On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <
> > > sergey@hortonworks.com>wrote:
> > > >>
> > > >>> Please take a look at the patch in
> > > >>> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498>...
> > > >>> this should make it possible to get more details.
> > > >>>
> > > >>> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <
> > > >>> sergey@hortonworks.com>wrote:
> > > >>>
> > > >>> > 1) Is there a cause- stack?
> > > >>> > 2) Can you ascertain if WAL is truncated at that place? Exception
> > > type
> > > >>> > might have changed/exception might have expanded between Hadoop 1
> > and
> > > >>> 2;
> > > >>> > WAL replay should ignore EOF, so if this is a EOF problem then
> this
> > > >>> would
> > > >>> > be easy to correct, if it's something more serious then it's bad.
> > > >>> > I will add some logging/catching around to add cause (if missing)
> > and
> > > >>> > useful logs.
> > > >>> >
> > > >>> >
> > > >>> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
> > > >>> > ramkrishna.s.vasudevan@gmail.com> wrote:
> > > >>> >
> > > >>> >> Hi All
> > > >>> >>
> > > >>> >> I am getting the following error when i run Trunk with
> > hadop-2.0.3.
> > > >>> >> java.io.IOException: Failed read of int length 2
> > > >>> >> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > > >>> >> at
> > > >>>
> > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> > > >>> >> at
> > > >>> >>
> > > >>> >>
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> > > >>> >>
> > > >>> >> Am able to reproduce this with the cluster but not with the
> > > testcases
> > > >>> even
> > > >>> >> when i run with 2.0.3.
> > > >>> >>
> > > >>> >> Regards
> > > >>> >> Ram
> > > >>> >>
> > > >>> >
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Posted by ramkrishna vasudevan <ra...@gmail.com>.

>The length is from read call. This looks pretty suspicious, if the stream
is not EOF why would it return less bytes?

Infact this happened for an entry in the middle of the HLog.  Write some
data using LoadTestTool.  Try to use the HLog tool to read the HLog file
you will end up in this issue. (most of the times).

Regards
Ram



On Wed, May 8, 2013 at 11:48 PM, Sergey Shelukhin <se...@hortonworks.com>wrote:

> if (length != intBytes.length) throw new IOException("Failed read of int
> length " + length);
> The length is from read call. This looks pretty suspicious, if the stream
> is not EOF why would it return less bytes? I will try to repro today.
>
> On Wed, May 8, 2013 at 5:46 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > On further debugging found that this issue happens with ProtoBufWriter
> and
> > not with sequenceFileWriter.(atleast we could not reproduce it with
> > different runs)
> >
> > We can see that the HLog has more data in it but while reading one of the
> > lines in the HLog this error happens.  So pretty much sure that it is not
> > EOF.
> > Verified DFS logs but could not find any exceptions out there too.
> >
> > We will try to figure out more on this tomorrow.
> >
> > Regards
> > Ram
> >
> >
> > On Wed, May 8, 2013 at 11:34 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> > > Ok so i tried this out with hadoop 2.0.4 and also with Sergey's patch.
> > >  The issue is reproducible in all version of hadoop but not always.
> > > I am able to get the errors like this
> > >
> > > 2013-05-07 17:11:08,827 WARN  [SplitLogWorker-ram.sh.intel.com
> > ,60020,1367961009182]
> > > org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
> > .logs/
> > > ram.sh.intel.com,60020,1367960957620-splitting/ram.sh.intel.com
> > %2C60020%2C1367960957620.1367960993389
> > > failed, returning error
> > > java.io.IOException: Error  while reading 1 WAL KVs; started reading at
> > > 589822 and read up to 589824
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:162)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> > > at java.lang.Thread.run(Thread.java:662)
> > > Caused by: java.io.IOException: Failed read of int length 2
> > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> > > at
> > >
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > >  at
> > org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:143)
> > > ... 10 more
> > >
> > >
> > > and sometimes
> > > java.io.IOException: Failed read of int length 1
> > > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > > at
> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2837)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:2755)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:664)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:569)
> > >  at
> > >
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:540)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4095)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4066)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4016)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3967)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:448)
> > >  at
> > >
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:136)
> > > at
> > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
> > >  at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > > at
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > >
> > >
> > > I will work on this today and find out the root cause of it.
> > >
> > > Regards
> > > Ram
> > >
> > >
> > > On Tue, May 7, 2013 at 8:12 AM, ramkrishna vasudevan <
> > > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >
> > >> I too think it may be EOF.  But i did not debug it fully.  Today let
> me
> > >> check and try applying your patch.
> > >>
> > >> Regards
> > >> Ram
> > >>
> > >>
> > >> On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <
> > sergey@hortonworks.com>wrote:
> > >>
> > >>> Please take a look at the patch in
> > >>> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498>...
> > >>> this should make it possible to get more details.
> > >>>
> > >>> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <
> > >>> sergey@hortonworks.com>wrote:
> > >>>
> > >>> > 1) Is there a cause- stack?
> > >>> > 2) Can you ascertain if WAL is truncated at that place? Exception
> > type
> > >>> > might have changed/exception might have expanded between Hadoop 1
> and
> > >>> 2;
> > >>> > WAL replay should ignore EOF, so if this is a EOF problem then this
> > >>> would
> > >>> > be easy to correct, if it's something more serious then it's bad.
> > >>> > I will add some logging/catching around to add cause (if missing)
> and
> > >>> > useful logs.
> > >>> >
> > >>> >
> > >>> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
> > >>> > ramkrishna.s.vasudevan@gmail.com> wrote:
> > >>> >
> > >>> >> Hi All
> > >>> >>
> > >>> >> I am getting the following error when i run Trunk with
> hadop-2.0.3.
> > >>> >> java.io.IOException: Failed read of int length 2
> > >>> >> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > >>> >> at
> > >>>
> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> > >>> >> at
> > >>> >>
> > >>> >>
> > >>>
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> > >>> >>
> > >>> >> Am able to reproduce this with the cluster but not with the
> > testcases
> > >>> even
> > >>> >> when i run with 2.0.3.
> > >>> >>
> > >>> >> Regards
> > >>> >> Ram
> > >>> >>
> > >>> >
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Posted by Sergey Shelukhin <se...@hortonworks.com>.

if (length != intBytes.length) throw new IOException("Failed read of int
length " + length);
The length is from read call. This looks pretty suspicious, if the stream
is not EOF why would it return less bytes? I will try to repro today.

On Wed, May 8, 2013 at 5:46 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> On further debugging found that this issue happens with ProtoBufWriter and
> not with sequenceFileWriter.(atleast we could not reproduce it with
> different runs)
>
> We can see that the HLog has more data in it but while reading one of the
> lines in the HLog this error happens.  So pretty much sure that it is not
> EOF.
> Verified DFS logs but could not find any exceptions out there too.
>
> We will try to figure out more on this tomorrow.
>
> Regards
> Ram
>
>
> On Wed, May 8, 2013 at 11:34 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
> > Ok so i tried this out with hadoop 2.0.4 and also with Sergey's patch.
> >  The issue is reproducible in all version of hadoop but not always.
> > I am able to get the errors like this
> >
> > 2013-05-07 17:11:08,827 WARN  [SplitLogWorker-ram.sh.intel.com
> ,60020,1367961009182]
> > org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of
> .logs/
> > ram.sh.intel.com,60020,1367960957620-splitting/ram.sh.intel.com
> %2C60020%2C1367960957620.1367960993389
> > failed, returning error
> > java.io.IOException: Error  while reading 1 WAL KVs; started reading at
> > 589822 and read up to 589824
> > at
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:162)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > at
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> > at
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> > at
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> > at
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> > at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.io.IOException: Failed read of int length 2
> > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> > at
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> >  at
> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
> > at
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:143)
> > ... 10 more
> >
> >
> > and sometimes
> > java.io.IOException: Failed read of int length 1
> > at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> >  at
> >
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> > at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> > at
> >
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> > at
> >
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2837)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:2755)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:664)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:569)
> >  at
> > org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:540)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4095)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4066)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4016)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3967)
> > at
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:448)
> >  at
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:136)
> > at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
> >  at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >
> >
> > I will work on this today and find out the root cause of it.
> >
> > Regards
> > Ram
> >
> >
> > On Tue, May 7, 2013 at 8:12 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> >> I too think it may be EOF.  But i did not debug it fully.  Today let me
> >> check and try applying your patch.
> >>
> >> Regards
> >> Ram
> >>
> >>
> >> On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <
> sergey@hortonworks.com>wrote:
> >>
> >>> Please take a look at the patch in
> >>> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498>...
> >>> this should make it possible to get more details.
> >>>
> >>> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <
> >>> sergey@hortonworks.com>wrote:
> >>>
> >>> > 1) Is there a cause- stack?
> >>> > 2) Can you ascertain if WAL is truncated at that place? Exception
> type
> >>> > might have changed/exception might have expanded between Hadoop 1 and
> >>> 2;
> >>> > WAL replay should ignore EOF, so if this is a EOF problem then this
> >>> would
> >>> > be easy to correct, if it's something more serious then it's bad.
> >>> > I will add some logging/catching around to add cause (if missing) and
> >>> > useful logs.
> >>> >
> >>> >
> >>> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
> >>> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >>> >
> >>> >> Hi All
> >>> >>
> >>> >> I am getting the following error when i run Trunk with hadop-2.0.3.
> >>> >> java.io.IOException: Failed read of int length 2
> >>> >> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> >>> >> at
> >>> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> >>> >> at
> >>> >>
> >>> >>
> >>>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> >>> >>
> >>> >> Am able to reproduce this with the cluster but not with the
> testcases
> >>> even
> >>> >> when i run with 2.0.3.
> >>> >>
> >>> >> Regards
> >>> >> Ram
> >>> >>
> >>> >
> >>> >
> >>>
> >>
> >>
> >
>

Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Posted by ramkrishna vasudevan <ra...@gmail.com>.

On further debugging found that this issue happens with ProtoBufWriter and
not with sequenceFileWriter.(atleast we could not reproduce it with
different runs)

We can see that the HLog has more data in it but while reading one of the
lines in the HLog this error happens.  So pretty much sure that it is not
EOF.
Verified DFS logs but could not find any exceptions out there too.

We will try to figure out more on this tomorrow.

Regards
Ram


On Wed, May 8, 2013 at 11:34 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> Ok so i tried this out with hadoop 2.0.4 and also with Sergey's patch.
>  The issue is reproducible in all version of hadoop but not always.
> I am able to get the errors like this
>
> 2013-05-07 17:11:08,827 WARN  [SplitLogWorker-ram.sh.intel.com,60020,1367961009182]
> org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of .logs/
> ram.sh.intel.com,60020,1367960957620-splitting/ram.sh.intel.com%2C60020%2C1367960957620.1367960993389
> failed, returning error
> java.io.IOException: Error  while reading 1 WAL KVs; started reading at
> 589822 and read up to 589824
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:162)
>  at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>  at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
>  at
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
>  at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
>  at
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Failed read of int length 2
> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
> at
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
>  at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
> at
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
>  at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:143)
> ... 10 more
>
>
> and sometimes
> java.io.IOException: Failed read of int length 1
> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
>  at
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
>  at
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> at
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
>  at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> at
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>  at
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2837)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:2755)
>  at
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:664)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:569)
>  at
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:540)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4095)
>  at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4066)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4016)
>  at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3967)
> at
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:448)
>  at
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:136)
> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>
> I will work on this today and find out the root cause of it.
>
> Regards
> Ram
>
>
> On Tue, May 7, 2013 at 8:12 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
>> I too think it may be EOF.  But i did not debug it fully.  Today let me
>> check and try applying your patch.
>>
>> Regards
>> Ram
>>
>>
>> On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <se...@hortonworks.com>wrote:
>>
>>> Please take a look at the patch in
>>> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498>...
>>> this should make it possible to get more details.
>>>
>>> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <
>>> sergey@hortonworks.com>wrote:
>>>
>>> > 1) Is there a cause- stack?
>>> > 2) Can you ascertain if WAL is truncated at that place? Exception type
>>> > might have changed/exception might have expanded between Hadoop 1 and
>>> 2;
>>> > WAL replay should ignore EOF, so if this is a EOF problem then this
>>> would
>>> > be easy to correct, if it's something more serious then it's bad.
>>> > I will add some logging/catching around to add cause (if missing) and
>>> > useful logs.
>>> >
>>> >
>>> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
>>> > ramkrishna.s.vasudevan@gmail.com> wrote:
>>> >
>>> >> Hi All
>>> >>
>>> >> I am getting the following error when i run Trunk with hadop-2.0.3.
>>> >> java.io.IOException: Failed read of int length 2
>>> >> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
>>> >> at
>>> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
>>> >> at
>>> >>
>>> >>
>>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
>>> >>
>>> >> Am able to reproduce this with the cluster but not with the testcases
>>> even
>>> >> when i run with 2.0.3.
>>> >>
>>> >> Regards
>>> >> Ram
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Posted by ramkrishna vasudevan <ra...@gmail.com>.

Ok so i tried this out with hadoop 2.0.4 and also with Sergey's patch.  The
issue is reproducible in all version of hadoop but not always.
I am able to get the errors like this

2013-05-07 17:11:08,827 WARN
[SplitLogWorker-ram.sh.intel.com,60020,1367961009182]
org.apache.hadoop.hbase.regionserver.SplitLogWorker: log splitting of .logs/
ram.sh.intel.com,60020,1367960957620-splitting/ram.sh.intel.com%2C60020%2C1367960957620.1367960993389
failed, returning error
java.io.IOException: Error  while reading 1 WAL KVs; started reading at
589822 and read up to 589824
at
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:162)
at
org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
at
org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
at
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
at
org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Failed read of int length 2
at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
at
org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:46)
at
org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
at
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:143)
... 10 more


and sometimes
java.io.IOException: Failed read of int length 1
at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:2335)
at
org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
at
org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
at
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
at
org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
at
org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
at
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:2837)
at
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:2755)
at
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:664)
at
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:569)
at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:540)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4095)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4066)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:4016)
at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3967)
at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:448)
at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:136)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)


I will work on this today and find out the root cause of it.

Regards
Ram


On Tue, May 7, 2013 at 8:12 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> I too think it may be EOF.  But i did not debug it fully.  Today let me
> check and try applying your patch.
>
> Regards
> Ram
>
>
> On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <se...@hortonworks.com>wrote:
>
>> Please take a look at the patch in
>> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498>...
>> this should make it possible to get more details.
>>
>> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <sergey@hortonworks.com
>> >wrote:
>>
>> > 1) Is there a cause- stack?
>> > 2) Can you ascertain if WAL is truncated at that place? Exception type
>> > might have changed/exception might have expanded between Hadoop 1 and 2;
>> > WAL replay should ignore EOF, so if this is a EOF problem then this
>> would
>> > be easy to correct, if it's something more serious then it's bad.
>> > I will add some logging/catching around to add cause (if missing) and
>> > useful logs.
>> >
>> >
>> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
>> > ramkrishna.s.vasudevan@gmail.com> wrote:
>> >
>> >> Hi All
>> >>
>> >> I am getting the following error when i run Trunk with hadop-2.0.3.
>> >> java.io.IOException: Failed read of int length 2
>> >> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
>> >> at
>> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
>> >> at
>> >>
>> >>
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
>> >>
>> >> Am able to reproduce this with the cluster but not with the testcases
>> even
>> >> when i run with 2.0.3.
>> >>
>> >> Regards
>> >> Ram
>> >>
>> >
>> >
>>
>
>

Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Posted by ramkrishna vasudevan <ra...@gmail.com>.

I too think it may be EOF.  But i did not debug it fully.  Today let me
check and try applying your patch.

Regards
Ram


On Tue, May 7, 2013 at 4:41 AM, Sergey Shelukhin <se...@hortonworks.com>wrote:

> Please take a look at the patch in
> HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498>...
> this should make it possible to get more details.
>
> On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <sergey@hortonworks.com
> >wrote:
>
> > 1) Is there a cause- stack?
> > 2) Can you ascertain if WAL is truncated at that place? Exception type
> > might have changed/exception might have expanded between Hadoop 1 and 2;
> > WAL replay should ignore EOF, so if this is a EOF problem then this would
> > be easy to correct, if it's something more serious then it's bad.
> > I will add some logging/catching around to add cause (if missing) and
> > useful logs.
> >
> >
> > On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
> > ramkrishna.s.vasudevan@gmail.com> wrote:
> >
> >> Hi All
> >>
> >> I am getting the following error when i run Trunk with hadop-2.0.3.
> >> java.io.IOException: Failed read of int length 2
> >> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> >> at
> org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
> >>
> >> Am able to reproduce this with the cluster but not with the testcases
> even
> >> when i run with 2.0.3.
> >>
> >> Regards
> >> Ram
> >>
> >
> >
>

Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Posted by Sergey Shelukhin <se...@hortonworks.com>.

Please take a look at the patch in
HBASE-8498<https://issues.apache.org/jira/browse/HBASE-8498>...
this should make it possible to get more details.

On Mon, May 6, 2013 at 11:36 AM, Sergey Shelukhin <se...@hortonworks.com>wrote:

> 1) Is there a cause- stack?
> 2) Can you ascertain if WAL is truncated at that place? Exception type
> might have changed/exception might have expanded between Hadoop 1 and 2;
> WAL replay should ignore EOF, so if this is a EOF problem then this would
> be easy to correct, if it's something more serious then it's bad.
> I will add some logging/catching around to add cause (if missing) and
> useful logs.
>
>
> On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
> ramkrishna.s.vasudevan@gmail.com> wrote:
>
>> Hi All
>>
>> I am getting the following error when i run Trunk with hadop-2.0.3.
>> java.io.IOException: Failed read of int length 2
>> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
>> at
>>
>> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
>> at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
>> at
>>
>> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
>>
>> Am able to reproduce this with the cluster but not with the testcases even
>> when i run with 2.0.3.
>>
>> Regards
>> Ram
>>
>
>

Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Posted by Sergey Shelukhin <se...@hortonworks.com>.

1) Is there a cause- stack?
2) Can you ascertain if WAL is truncated at that place? Exception type
might have changed/exception might have expanded between Hadoop 1 and 2;
WAL replay should ignore EOF, so if this is a EOF problem then this would
be easy to correct, if it's something more serious then it's bad.
I will add some logging/catching around to add cause (if missing) and
useful logs.

On Mon, May 6, 2013 at 4:26 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> Hi All
>
> I am getting the following error when i run Trunk with hadop-2.0.3.
> java.io.IOException: Failed read of int length 2
> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
> at
>
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> at
>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> at
>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> at
>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> at
>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
>
> Am able to reproduce this with the cluster but not with the testcases even
> when i run with 2.0.3.
>
> Regards
> Ram
>

Re: HLogSplit error with hadoop-2.0.3-alpha and hbase trunk

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Ram,

Wich JDK are you testing with?

JM
Le 6 mai 2013 04:26, "ramkrishna vasudevan" <
ramkrishna.s.vasudevan@gmail.com> a écrit :

> Hi All
>
> I am getting the following error when i run Trunk with hadop-2.0.3.
> java.io.IOException: Failed read of int length 2
> at org.apache.hadoop.hbase.KeyValue.iscreate(KeyValue.java:3002)
> at
>
> org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell(KeyValueCodec.java:66)
> at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:41)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.WALEdit.readFromCells(WALEdit.java:199)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.readNext(ProtobufLogReader.java:137)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:88)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.ReaderBase.next(ReaderBase.java:75)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getNextLogLine(HLogSplitter.java:775)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:459)
> at
>
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:388)
> at
>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:115)
> at
>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.grabTask(SplitLogWorker.java:278)
> at
>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.taskLoop(SplitLogWorker.java:199)
> at
>
> org.apache.hadoop.hbase.regionserver.SplitLogWorker.run(SplitLogWorker.java:166)
>
> Am able to reproduce this with the cluster but not with the testcases even
> when i run with 2.0.3.
>
> Regards
> Ram
>