You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Stan Barton <ba...@gmail.com> on 2011/08/04 15:54:07 UTC

Possible bug in reading KeyValues from sequence files in HBase 0.90

Hello,

I am encountering strange error using HBase 0.90.3, the scenario:

I am writing KeyValues in Sequence files as an intermediate input for
further bulk loading using MapReduce. The problem I am facing is that when I
try to read the KeyValues from the sequence file in the Mapper (in order to
emit them for sorting) they are garbled and thus the whole job fails because
of the Exceptions thrown by trying to process the garbled keyvalues. For
instance, when trying to output the KeyValue object I get
java.lang.ArrayIndexOutOfBoundsException or java.lang.RuntimeException:
Unknown code XX. 

I have spent a lot of time in order to track down the bug and found out that
when I write the SequenceFile of KeyValues with HBase 0.90.3 I cannot read
the content back using the same HBase version jar, however I am able to read
it without any problems with HBase 0.20.* versions. It is easily
reproducible with this unit test.

As for Hadoop, I am using the Cloudera distribution CDH3 Beta 2.

Thanks for help.

Stan Barton
-- 
View this message in context: http://old.nabble.com/Possible-bug-in-reading-KeyValues-from-sequence-files-in-HBase-0.90-tp32194680p32194680.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: Possible bug in reading KeyValues from sequence files in HBase 0.90

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Hi Stan,

HBase itself stores KVs in SequenceFiles when writing to the
write-ahead log and it's able to read them, so I know for sure that
that mechanism works. Would you mind writing a small unit test to show
us how you trigger your issue?

Thanks,

J-D

On Thu, Aug 4, 2011 at 6:54 AM, Stan Barton <ba...@gmail.com> wrote:
>
> Hello,
>
> I am encountering strange error using HBase 0.90.3, the scenario:
>
> I am writing KeyValues in Sequence files as an intermediate input for
> further bulk loading using MapReduce. The problem I am facing is that when I
> try to read the KeyValues from the sequence file in the Mapper (in order to
> emit them for sorting) they are garbled and thus the whole job fails because
> of the Exceptions thrown by trying to process the garbled keyvalues. For
> instance, when trying to output the KeyValue object I get
> java.lang.ArrayIndexOutOfBoundsException or java.lang.RuntimeException:
> Unknown code XX.
>
> I have spent a lot of time in order to track down the bug and found out that
> when I write the SequenceFile of KeyValues with HBase 0.90.3 I cannot read
> the content back using the same HBase version jar, however I am able to read
> it without any problems with HBase 0.20.* versions. It is easily
> reproducible with this unit test.
>
> As for Hadoop, I am using the Cloudera distribution CDH3 Beta 2.
>
> Thanks for help.
>
> Stan Barton
> --
> View this message in context: http://old.nabble.com/Possible-bug-in-reading-KeyValues-from-sequence-files-in-HBase-0.90-tp32194680p32194680.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: Possible bug in reading KeyValues from sequence files in HBase 0.90

Posted by Stan Barton <ba...@gmail.com>.
I have added the file to the issue, hopefully it will make it through. 

Regarding the context, I have encountered this problem firstly in a
MapReduce job trying to bulk import the kvs into hbase 0.90.3, I started
tracking down the problem and found out, that the errors disappear when I
use the version 0.20.6 - originally I was getting the exception mentioning
that a keys are in wrong order while doing the reduce part of the bulk
import. So I digged deeper and now I can reproduce the problem, without the
bulky mapreduce framework just by executing the java class containing the
code I previously pasted - any tim (because I suspected that I am doing
something wrong in the preparation phase of the kvs bulk import which turned
out not to be true - with the workaround mentioned in earlier post the bulk
import works fine).

Stan


stack-5 wrote:
> 
> This could be a pretty serums issue Stan.  It happens in context only?  
> Can u stick the sequence file up in an issue so one us could take a look?
> 
> Stack
> 
> 
> 
> On Sep 5, 2011, at 1:18, Stan Barton <ba...@gmail.com> wrote:
> 
>> 
>> 
>> 
>> stack-3 wrote:
>>> 
>>> On Thu, Aug 4, 2011 at 6:54 AM, Stan Barton <ba...@gmail.com> wrote:
>>>> I have spent a lot of time in order to track down the bug and found out
>>>> that
>>>> when I write the SequenceFile of KeyValues with HBase 0.90.3 I cannot
>>>> read
>>>> the content back using the same HBase version jar, however I am able to
>>>> read
>>>> it without any problems with HBase 0.20.* versions. It is easily
>>>> reproducible with this unit test.
>>>> 
>>> 
>>> Stan:
>>> 
>>> You are writing kvs with 0.90 and they are readable with 0.20 but not
>>> w/ the jar that wrote them?
>>> 
>>> Where is the unit test you refer to?  Attachments usually don't make
>>> it across so you might have to pastebin it.
>>> 
>>> St.Ack
>>> 
>>> 
>> 
>> Exactly, I create the kvs with any of the > v0.90 jar and am not able to
>> read it back. By digging deeper, I have found a work-around that solves
>> the
>> problem:
>> 
>> KeyValue kv2 = new KeyValue(kvOrig.getBuffer());
>> 
>> which means that the buffer is read properly by all jars, but somehow in
>> the
>> new versions it is parsed wrong. I have compared the values of the leght
>> and
>> offset values that are read in by class KV in the particular hbase
>> versions:
>> 
>> I took a simple sequence file stored in HDFS containing Long and kvs. I
>> have
>> then output the lengths and offsets of row, key, value, family and
>> qualifier
>> respectively (+ plus some other kv related info - the whole procedure can
>> be
>> found here http://pastebin.com/kxC5GrtM ):
>> 
>> version 0.20.6:
>> 1-url/content:content/1264692453000/Put/vlen=2-0-39
>> r:10-3
>> k:8-29
>> v:37-2
>> f:14-7
>> q:21-7
>> 39:\x00\x00\x00\x1D\x00\x00\x00\x02\x00\x03url\x07contentcontent\x00\x00\x01&u\x8B^\x88\x04\x00\x00
>> 2-url/meta:statusCode/1264692453000/Put/vlen=3-0-40
>> r:10-3
>> k:8-29
>> v:37-3
>> f:14-4
>> q:18-10
>> 40:\x00\x00\x00\x1D\x00\x00\x00\x03\x00\x03url\x04metastatusCode\x00\x00\x01&u\x8B^\x88\x04200
>> 3-url/meta:length/1264692453000/Put/vlen=8-0-41
>> r:10-3
>> k:8-25
>> v:33-8
>> f:14-4
>> q:18-6
>> 
>> 
>> 
>> version 0.90.3:
>> 
>> 1-url/content:content/1264692453000/Put/vlen=2-0-39
>> r:10-3
>> k:8-29
>> v:37-2
>> f:14-7
>> q:21-7
>> 39:\x00\x00\x00\x1D\x00\x00\x00\x02\x00\x03url\x07contentcontent\x00\x00\x01&u\x8B^\x88\x04\x00\x00
>> 2-url/meta:statusCode/1264692453000/Put/vlen=3-0-40
>> r:10-3
>> k:8-29
>> v:37-3
>> f:14-4
>> q:18-10
>> 40:\x00\x00\x00\x1D\x00\x00\x00\x03\x00\x03url\x04metastatusCode\x00\x00\x01&u\x8B^\x88\x04200
>> 3-url/meta:length\x00\x00\x01&/8469967462476021760/Minimum/vlen=8-0-41
>> r:10-3
>> k:8-29
>> v:37-8
>> f:14-4
>> q:18-10
>> 
>> 
>> you can see the discrepancy in the third kv read in, namely in the length
>> of
>> the key as is parsed by v0.20.6 (25) and the v.90 (29). This garbles the
>> read in stream. However I have not found why is this happening.
>> 
>> Stan
>> -- 
>> View this message in context:
>> http://old.nabble.com/Possible-bug-in-reading-KeyValues-from-sequence-files-in-HBase-0.90-tp32194680p32399356.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>> 
> 
> 
> 
http://old.nabble.com/file/p32409228/myTestFile.seq myTestFile.seq 
-- 
View this message in context: http://old.nabble.com/Possible-bug-in-reading-KeyValues-from-sequence-files-in-HBase-0.90-tp32194680p32409228.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: Possible bug in reading KeyValues from sequence files in HBase 0.90

Posted by Stack <sa...@gmail.com>.
This could be a pretty serums issue Stan.  It happens in context only?   Can u stick the sequence file up in an issue so one us could take a look?

Stack



On Sep 5, 2011, at 1:18, Stan Barton <ba...@gmail.com> wrote:

> 
> 
> 
> stack-3 wrote:
>> 
>> On Thu, Aug 4, 2011 at 6:54 AM, Stan Barton <ba...@gmail.com> wrote:
>>> I have spent a lot of time in order to track down the bug and found out
>>> that
>>> when I write the SequenceFile of KeyValues with HBase 0.90.3 I cannot
>>> read
>>> the content back using the same HBase version jar, however I am able to
>>> read
>>> it without any problems with HBase 0.20.* versions. It is easily
>>> reproducible with this unit test.
>>> 
>> 
>> Stan:
>> 
>> You are writing kvs with 0.90 and they are readable with 0.20 but not
>> w/ the jar that wrote them?
>> 
>> Where is the unit test you refer to?  Attachments usually don't make
>> it across so you might have to pastebin it.
>> 
>> St.Ack
>> 
>> 
> 
> Exactly, I create the kvs with any of the > v0.90 jar and am not able to
> read it back. By digging deeper, I have found a work-around that solves the
> problem:
> 
> KeyValue kv2 = new KeyValue(kvOrig.getBuffer());
> 
> which means that the buffer is read properly by all jars, but somehow in the
> new versions it is parsed wrong. I have compared the values of the leght and
> offset values that are read in by class KV in the particular hbase versions:
> 
> I took a simple sequence file stored in HDFS containing Long and kvs. I have
> then output the lengths and offsets of row, key, value, family and qualifier
> respectively (+ plus some other kv related info - the whole procedure can be
> found here http://pastebin.com/kxC5GrtM ):
> 
> version 0.20.6:
> 1-url/content:content/1264692453000/Put/vlen=2-0-39
> r:10-3
> k:8-29
> v:37-2
> f:14-7
> q:21-7
> 39:\x00\x00\x00\x1D\x00\x00\x00\x02\x00\x03url\x07contentcontent\x00\x00\x01&u\x8B^\x88\x04\x00\x00
> 2-url/meta:statusCode/1264692453000/Put/vlen=3-0-40
> r:10-3
> k:8-29
> v:37-3
> f:14-4
> q:18-10
> 40:\x00\x00\x00\x1D\x00\x00\x00\x03\x00\x03url\x04metastatusCode\x00\x00\x01&u\x8B^\x88\x04200
> 3-url/meta:length/1264692453000/Put/vlen=8-0-41
> r:10-3
> k:8-25
> v:33-8
> f:14-4
> q:18-6
> 
> 
> 
> version 0.90.3:
> 
> 1-url/content:content/1264692453000/Put/vlen=2-0-39
> r:10-3
> k:8-29
> v:37-2
> f:14-7
> q:21-7
> 39:\x00\x00\x00\x1D\x00\x00\x00\x02\x00\x03url\x07contentcontent\x00\x00\x01&u\x8B^\x88\x04\x00\x00
> 2-url/meta:statusCode/1264692453000/Put/vlen=3-0-40
> r:10-3
> k:8-29
> v:37-3
> f:14-4
> q:18-10
> 40:\x00\x00\x00\x1D\x00\x00\x00\x03\x00\x03url\x04metastatusCode\x00\x00\x01&u\x8B^\x88\x04200
> 3-url/meta:length\x00\x00\x01&/8469967462476021760/Minimum/vlen=8-0-41
> r:10-3
> k:8-29
> v:37-8
> f:14-4
> q:18-10
> 
> 
> you can see the discrepancy in the third kv read in, namely in the length of
> the key as is parsed by v0.20.6 (25) and the v.90 (29). This garbles the
> read in stream. However I have not found why is this happening.
> 
> Stan
> -- 
> View this message in context: http://old.nabble.com/Possible-bug-in-reading-KeyValues-from-sequence-files-in-HBase-0.90-tp32194680p32399356.html
> Sent from the HBase User mailing list archive at Nabble.com.
> 


Re: Possible bug in reading KeyValues from sequence files in HBase 0.90

Posted by Stan Barton <ba...@gmail.com>.


stack-3 wrote:
> 
> On Thu, Aug 4, 2011 at 6:54 AM, Stan Barton <ba...@gmail.com> wrote:
>> I have spent a lot of time in order to track down the bug and found out
>> that
>> when I write the SequenceFile of KeyValues with HBase 0.90.3 I cannot
>> read
>> the content back using the same HBase version jar, however I am able to
>> read
>> it without any problems with HBase 0.20.* versions. It is easily
>> reproducible with this unit test.
>>
> 
> Stan:
> 
> You are writing kvs with 0.90 and they are readable with 0.20 but not
> w/ the jar that wrote them?
> 
> Where is the unit test you refer to?  Attachments usually don't make
> it across so you might have to pastebin it.
> 
> St.Ack
> 
> 

Exactly, I create the kvs with any of the > v0.90 jar and am not able to
read it back. By digging deeper, I have found a work-around that solves the
problem:

KeyValue kv2 = new KeyValue(kvOrig.getBuffer());

which means that the buffer is read properly by all jars, but somehow in the
new versions it is parsed wrong. I have compared the values of the leght and
offset values that are read in by class KV in the particular hbase versions:

I took a simple sequence file stored in HDFS containing Long and kvs. I have
then output the lengths and offsets of row, key, value, family and qualifier
respectively (+ plus some other kv related info - the whole procedure can be
found here http://pastebin.com/kxC5GrtM ):

version 0.20.6:
1-url/content:content/1264692453000/Put/vlen=2-0-39
r:10-3
k:8-29
v:37-2
f:14-7
q:21-7
39:\x00\x00\x00\x1D\x00\x00\x00\x02\x00\x03url\x07contentcontent\x00\x00\x01&u\x8B^\x88\x04\x00\x00
2-url/meta:statusCode/1264692453000/Put/vlen=3-0-40
r:10-3
k:8-29
v:37-3
f:14-4
q:18-10
40:\x00\x00\x00\x1D\x00\x00\x00\x03\x00\x03url\x04metastatusCode\x00\x00\x01&u\x8B^\x88\x04200
3-url/meta:length/1264692453000/Put/vlen=8-0-41
r:10-3
k:8-25
v:33-8
f:14-4
q:18-6



version 0.90.3:

1-url/content:content/1264692453000/Put/vlen=2-0-39
r:10-3
k:8-29
v:37-2
f:14-7
q:21-7
39:\x00\x00\x00\x1D\x00\x00\x00\x02\x00\x03url\x07contentcontent\x00\x00\x01&u\x8B^\x88\x04\x00\x00
2-url/meta:statusCode/1264692453000/Put/vlen=3-0-40
r:10-3
k:8-29
v:37-3
f:14-4
q:18-10
40:\x00\x00\x00\x1D\x00\x00\x00\x03\x00\x03url\x04metastatusCode\x00\x00\x01&u\x8B^\x88\x04200
3-url/meta:length\x00\x00\x01&/8469967462476021760/Minimum/vlen=8-0-41
r:10-3
k:8-29
v:37-8
f:14-4
q:18-10


you can see the discrepancy in the third kv read in, namely in the length of
the key as is parsed by v0.20.6 (25) and the v.90 (29). This garbles the
read in stream. However I have not found why is this happening.

Stan
-- 
View this message in context: http://old.nabble.com/Possible-bug-in-reading-KeyValues-from-sequence-files-in-HBase-0.90-tp32194680p32399356.html
Sent from the HBase User mailing list archive at Nabble.com.


Re: Possible bug in reading KeyValues from sequence files in HBase 0.90

Posted by Stack <st...@duboce.net>.
On Thu, Aug 4, 2011 at 6:54 AM, Stan Barton <ba...@gmail.com> wrote:
> I have spent a lot of time in order to track down the bug and found out that
> when I write the SequenceFile of KeyValues with HBase 0.90.3 I cannot read
> the content back using the same HBase version jar, however I am able to read
> it without any problems with HBase 0.20.* versions. It is easily
> reproducible with this unit test.
>

Stan:

You are writing kvs with 0.90 and they are readable with 0.20 but not
w/ the jar that wrote them?

Where is the unit test you refer to?  Attachments usually don't make
it across so you might have to pastebin it.

St.Ack