You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Joost Ouwerkerk <jo...@openplaces.org> on 2010/01/22 16:56:44 UTC

Get and Scan return different results in 0.20.2

We're seeing some dangerously inconsistent behaviour in retrieving data from
HBase.  In particular circumstances whose conditions are still unclear, get
and scan (without timestamp params) are returning different versions of a
column.  We are running 0.20.2.  See below for evidence.

hbase(main):006:0> scan 'generated_pages',{STARTROW=>'240:
http://com.golflink.www/golf-courses/course.aspx?course=1008656
',LIMIT=>2,COLUMNS=>['attribute:url']}
ROW                          COLUMN+CELL

 240:http://com.golflink.www column=attribute:url, timestamp=*
5429280163307928320*, value=\001http://www.golflin
 /golf-courses/course.aspx?c k.com/golf-courses/course.aspx?course=1008656

 ourse=1008656

2 row(s) in 0.0100 seconds

hbase(main):007:0> get 'generated_pages', '240:
http://com.golflink.www/golf-courses/course.aspx?course=1008656',
COLUMN=>'attribute:url'
timestamp=*5429243797819101088*, value=\001
http://www.golflink.com/golf-courses/course.aspx?course=1008656
1 row(s) in 0.0020 seconds

Any ideas about how this is possible?

joost.

Re: Get and Scan return different results in 0.20.2

Posted by Jean-Daniel Cryans <jd...@apache.org>.
So after an offline discussion and some more discussion on IRC, it was
found that the problem was similar to
http://issues.apache.org/jira/browse/HBASE-29 and was caused by clock
skew. The fact that they set their timestamps exacerbates the problem
because the different clients had wildly different dates; if it was
the region server setting the ts then it would be more consistant.

The resolution for the user is to resolve the clock skew and on the
HBase side we need to make the get behave more like the scan.

J-D

On Fri, Jan 22, 2010 at 12:11 PM, Joost Ouwerkerk <jo...@openplaces.org> wrote:
> We do set an explicit timestamp, and I understand that we may be among the
> few in this regard.  We haven't performed any deletes on those rows.  I will
> try flushing and let you know...
>
> On Fri, Jan 22, 2010 at 1:52 PM, Stack <st...@duboce.net> wrote:
>
>> How were cells inserted?  With explicit timestamp?  Any deletes
>> floating around?  If you flush the region, does the behavior change?
>> (See 'tools' in the shell.... do hbase> flush 'regionname'... you'll
>> have to figure out the region that is hosting the row you are looking
>> at).  Can you bundle up the region that these cells are in and pass it
>> to us somehow?
>> St.Ack
>>
>> On Fri, Jan 22, 2010 at 7:56 AM, Joost Ouwerkerk <jo...@openplaces.org>
>> wrote:
>> > We're seeing some dangerously inconsistent behaviour in retrieving data
>> from
>> > HBase.  In particular circumstances whose conditions are still unclear,
>> get
>> > and scan (without timestamp params) are returning different versions of a
>> > column.  We are running 0.20.2.  See below for evidence.
>> >
>> > hbase(main):006:0> scan 'generated_pages',{STARTROW=>'240:
>> > http://com.golflink.www/golf-courses/course.aspx?course=1008656
>> > ',LIMIT=>2,COLUMNS=>['attribute:url']}
>> > ROW                          COLUMN+CELL
>> >
>> >  240:http://com.golflink.www column=attribute:url, timestamp=*
>> > 5429280163307928320*, value=\001http://www.golflin
>> >  /golf-courses/course.aspx?c
>> k.com/golf-courses/course.aspx?course=1008656
>> >
>> >  ourse=1008656
>> >
>> > 2 row(s) in 0.0100 seconds
>> >
>> > hbase(main):007:0> get 'generated_pages', '240:
>> > http://com.golflink.www/golf-courses/course.aspx?course=1008656',
>> > COLUMN=>'attribute:url'
>> > timestamp=*5429243797819101088*, value=\001
>> > http://www.golflink.com/golf-courses/course.aspx?course=1008656
>> > 1 row(s) in 0.0020 seconds
>> >
>> > Any ideas about how this is possible?
>> >
>> > joost.
>> >
>>
>

Re: Get and Scan return different results in 0.20.2

Posted by Joost Ouwerkerk <jo...@openplaces.org>.
We do set an explicit timestamp, and I understand that we may be among the
few in this regard.  We haven't performed any deletes on those rows.  I will
try flushing and let you know...

On Fri, Jan 22, 2010 at 1:52 PM, Stack <st...@duboce.net> wrote:

> How were cells inserted?  With explicit timestamp?  Any deletes
> floating around?  If you flush the region, does the behavior change?
> (See 'tools' in the shell.... do hbase> flush 'regionname'... you'll
> have to figure out the region that is hosting the row you are looking
> at).  Can you bundle up the region that these cells are in and pass it
> to us somehow?
> St.Ack
>
> On Fri, Jan 22, 2010 at 7:56 AM, Joost Ouwerkerk <jo...@openplaces.org>
> wrote:
> > We're seeing some dangerously inconsistent behaviour in retrieving data
> from
> > HBase.  In particular circumstances whose conditions are still unclear,
> get
> > and scan (without timestamp params) are returning different versions of a
> > column.  We are running 0.20.2.  See below for evidence.
> >
> > hbase(main):006:0> scan 'generated_pages',{STARTROW=>'240:
> > http://com.golflink.www/golf-courses/course.aspx?course=1008656
> > ',LIMIT=>2,COLUMNS=>['attribute:url']}
> > ROW                          COLUMN+CELL
> >
> >  240:http://com.golflink.www column=attribute:url, timestamp=*
> > 5429280163307928320*, value=\001http://www.golflin
> >  /golf-courses/course.aspx?c
> k.com/golf-courses/course.aspx?course=1008656
> >
> >  ourse=1008656
> >
> > 2 row(s) in 0.0100 seconds
> >
> > hbase(main):007:0> get 'generated_pages', '240:
> > http://com.golflink.www/golf-courses/course.aspx?course=1008656',
> > COLUMN=>'attribute:url'
> > timestamp=*5429243797819101088*, value=\001
> > http://www.golflink.com/golf-courses/course.aspx?course=1008656
> > 1 row(s) in 0.0020 seconds
> >
> > Any ideas about how this is possible?
> >
> > joost.
> >
>

Re: Get and Scan return different results in 0.20.2

Posted by Stack <st...@duboce.net>.
How were cells inserted?  With explicit timestamp?  Any deletes
floating around?  If you flush the region, does the behavior change?
(See 'tools' in the shell.... do hbase> flush 'regionname'... you'll
have to figure out the region that is hosting the row you are looking
at).  Can you bundle up the region that these cells are in and pass it
to us somehow?
St.Ack

On Fri, Jan 22, 2010 at 7:56 AM, Joost Ouwerkerk <jo...@openplaces.org> wrote:
> We're seeing some dangerously inconsistent behaviour in retrieving data from
> HBase.  In particular circumstances whose conditions are still unclear, get
> and scan (without timestamp params) are returning different versions of a
> column.  We are running 0.20.2.  See below for evidence.
>
> hbase(main):006:0> scan 'generated_pages',{STARTROW=>'240:
> http://com.golflink.www/golf-courses/course.aspx?course=1008656
> ',LIMIT=>2,COLUMNS=>['attribute:url']}
> ROW                          COLUMN+CELL
>
>  240:http://com.golflink.www column=attribute:url, timestamp=*
> 5429280163307928320*, value=\001http://www.golflin
>  /golf-courses/course.aspx?c k.com/golf-courses/course.aspx?course=1008656
>
>  ourse=1008656
>
> 2 row(s) in 0.0100 seconds
>
> hbase(main):007:0> get 'generated_pages', '240:
> http://com.golflink.www/golf-courses/course.aspx?course=1008656',
> COLUMN=>'attribute:url'
> timestamp=*5429243797819101088*, value=\001
> http://www.golflink.com/golf-courses/course.aspx?course=1008656
> 1 row(s) in 0.0020 seconds
>
> Any ideas about how this is possible?
>
> joost.
>