You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Jonathan Hsieh <jo...@cloudera.com> on 2011/10/10 21:52:41 UTC

HBase scanner semantics and inconsistencies.

I've working a problem related to ACID rules on scans as defined here
http://hbase.apache.org/acid-semantics.html.

In the two scenarios no new rowkeys are written -- all writes are
"overwrites" of already existing rows.  So lets say the table has 3 rows:
A1, B1, C1; and overwrites rows with values A2,B2,C2 respectively.   I'd
expect all the scans would have the same number of elements (3) but with
potentially any combination of A1|A2, B1|B2, C1|C2.  (since no acid
guarantees across rows).

Scenario 1: I have an MR job that does a filtered scan (confirmed this
problem happens without filter as well)  from a table, takes the row and
writes back to the same row and the same table.   I then run the job 2-3x
concurrently on the same table with the same filter.  I believe both should
return the same number of elements read.  It seems that in the case of a
multiple column family table, this is not the case -- in particular
sometimes MR counters reports that it had *fewer* than than the expected
number of input records (but never more).  This seems wrong.  Agree?

Scenario 2: When  trying to duplicate the problem excluding the MR portions,
I wrote two programs -- one that does a filtered scan and that overwrites of
existing rows, and one that just does the same filtered scan that just
counts the number rows read.   I've actually also used the
TableRecordReaderImpl code that the MR TableInputFormat uses.  In this case,
sometimes the scan/counting job actually returns *too many* entries -- for
some rowkeys return two records.  (but never fewer).  This should probably
not  happen as well.  Agree?

I've observed this on 0.90.1+patches and 0.90.3+patches.  It was also
claimed that this issue was not seen in a 0.89.x based hbase.  Thus far I've
been able to reproduce this on  multiple-column-family tables.

I believe scenario 1's problem is related to these.  Concur?
https://issues.apache.org/jira/browse/HBASE-2856
https://issues.apache.org/jira/browse/HBASE-3498

I think scenario 2 is probably related but not sure if it is the same issue.

Are there other related JIRA's?

Any hints for where to hunt this down? (I'm starting to go into the scan and
write paths on the RegionServer, but this is a fairly large area...

Thanks
Jon.

-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: HBase scanner semantics and inconsistencies.

Posted by Jonathan Hsieh <jo...@cloudera.com>.

Just to wrap this up -- HBASE-4570 was filed and more details are there.
There is also a small patch that addresses byt symptom described in this
scenario.  It  ended up being a race with rowname caching in the KV store.

This isn't related to flushes, which has become the focus of HBASE-4485 and
HBASE-2856's subtasks. I however should be part of making HBASE-2856
(TestAcidGuarantees) pass consistently.

I've tested it on trunk and a version of 0.90.x

Jon.

On Mon, Oct 10, 2011 at 5:03 PM, Stack <st...@duboce.net> wrote:

> Keep digging Jon.
> St.Ack
>
> On Mon, Oct 10, 2011 at 3:55 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
> > Yup, I'm pretty convinced this is a consistency and isolation problem.
> >  Here's the output of my program, (expected 59 rows) and some output from
> > problem rows.
> >
> > The row has 10 column families ("f0","f1"... "f9") with two cols.  There
> are
> > actually two 'row0000021332' rows -- the 48 and the 49 versions.  Each
> came
> > from a separate calls to next().  Notice how they are split between
> column
> > families!   Over many runs this happens many times although "splitting"
> > between different column families.
> >
> > scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows
> > scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows
> > scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows
> > scan items counted: 60 with 0 null ColumnFamilyMaps and 0 nullRows
> > Row row0000021332 had time stamps: [48:
> > keyvalues={row0000021332/f0:data/1318200437341/Put/vlen=1000,
> > row0000021332/f0:qual/1318200437341/Put/vlen=10,
> > row0000021332/f1:data/1318200437341/Put/vlen=1000,
> > row0000021332/f1:qual/1318200437341/Put/vlen=10,
> > row0000021332/f2:data/1318200437341/Put/vlen=1000,
> > row0000021332/f2:qual/1318200437341/Put/vlen=10,
> > row0000021332/f3:data/1318200437341/Put/vlen=1000,
> > row0000021332/f3:qual/1318200437341/Put/vlen=10,
> > row0000021332/f4:data/1318200437341/Put/vlen=1000,
> > row0000021332/f4:qual/1318200437341/Put/vlen=10,
> > row0000021332/f5:data/1318200437341/Put/vlen=1000,
> > row0000021332/f5:qual/1318200437341/Put/vlen=10}, 49:
> > keyvalues={row0000021332/f6:data/1318200437341/Put/vlen=1000,
> > row0000021332/f6:qual/1318200437341/Put/vlen=10,
> > row0000021332/f7:data/1318200437341/Put/vlen=1000,
> > row0000021332/f7:qual/1318200437341/Put/vlen=10,
> > row0000021332/f8:data/1318200437341/Put/vlen=1000,
> > row0000021332/f8:qual/1318200437341/Put/vlen=10,
> > row0000021332/f9:data/1318200437341/Put/vlen=1000,
> > row0000021332/f9:qual/1318200437341/Put/vlen=10}]
> >
> > Jon.
> >
> > On Mon, Oct 10, 2011 at 1:50 PM, Jonathan Hsieh <jo...@cloudera.com>
> wrote:
> >
> >> I've modified scenario 2's filter to setFilterIfMissing(true) which I
> >> missed in scenario 1 and now have an undercount like in that scenario.
> In
> >> another experiement, I found that getting the latest version
> >> Record.getColumnLatest(..)  would return null.
> >>
> >> This seems like a consistency+isolation violation of the expected
> row-acid
> >> properties.
> >>
> >> ----
> >> Scan s = new Scan();
> >> ...
> >> SingleColumnValueFilter filter = new SingleColumnValueFilter(
> >>         Bytes.toBytes("f1"), Bytes.toBytes("qual"), CompareOp.EQUAL,
> >> value);
> >>     filter.setFilterIfMissing(true); // this is what drops events.
> >>     s.setFilter(filter);
> >>  ---
> >>
> >> Jon.
> >>
> >> On Mon, Oct 10, 2011 at 12:52 PM, Jonathan Hsieh <jo...@cloudera.com>
> wrote:
> >>
> >>> I've working a problem related to ACID rules on scans as defined here
> >>> http://hbase.apache.org/acid-semantics.html.
> >>>
> >>> In the two scenarios no new rowkeys are written -- all writes are
> >>> "overwrites" of already existing rows.  So lets say the table has 3
> rows:
> >>> A1, B1, C1; and overwrites rows with values A2,B2,C2 respectively.
> I'd
> >>> expect all the scans would have the same number of elements (3) but
> with
> >>> potentially any combination of A1|A2, B1|B2, C1|C2.  (since no acid
> >>> guarantees across rows).
> >>>
> >>> Scenario 1: I have an MR job that does a filtered scan (confirmed this
> >>> problem happens without filter as well)  from a table, takes the row
> and
> >>> writes back to the same row and the same table.   I then run the job
> 2-3x
> >>> concurrently on the same table with the same filter.  I believe both
> should
> >>> return the same number of elements read.  It seems that in the case of
> a
> >>> multiple column family table, this is not the case -- in particular
> >>> sometimes MR counters reports that it had *fewer* than than the
> expected
> >>> number of input records (but never more).  This seems wrong.  Agree?
> >>>
> >>> Scenario 2: When  trying to duplicate the problem excluding the MR
> >>> portions, I wrote two programs -- one that does a filtered scan and
> that
> >>> overwrites of existing rows, and one that just does the same filtered
> scan
> >>> that just counts the number rows read.   I've actually also used the
> >>> TableRecordReaderImpl code that the MR TableInputFormat uses.  In this
> case,
> >>> sometimes the scan/counting job actually returns *too many* entries --
> for
> >>> some rowkeys return two records.  (but never fewer).  This should
> probably
> >>> not  happen as well.  Agree?
> >>>
> >>> I've observed this on 0.90.1+patches and 0.90.3+patches.  It was also
> >>> claimed that this issue was not seen in a 0.89.x based hbase.  Thus far
> I've
> >>> been able to reproduce this on  multiple-column-family tables.
> >>>
> >>> I believe scenario 1's problem is related to these.  Concur?
> >>> https://issues.apache.org/jira/browse/HBASE-2856
> >>>  https://issues.apache.org/jira/browse/HBASE-3498
> >>>
> >>> I think scenario 2 is probably related but not sure if it is the same
> >>> issue.
> >>>
> >>> Are there other related JIRA's?
> >>>
> >>> Any hints for where to hunt this down? (I'm starting to go into the
> scan
> >>> and write paths on the RegionServer, but this is a fairly large area...
> >>>
> >>> Thanks
> >>> Jon.
> >>>
> >>> --
> >>> // Jonathan Hsieh (shay)
> >>> // Software Engineer, Cloudera
> >>> // jon@cloudera.com
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> // Jonathan Hsieh (shay)
> >> // Software Engineer, Cloudera
> >> // jon@cloudera.com
> >>
> >>
> >>
> >
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // jon@cloudera.com
> >
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: HBase scanner semantics and inconsistencies.

Posted by Stack <st...@duboce.net>.

Keep digging Jon.
St.Ack

On Mon, Oct 10, 2011 at 3:55 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
> Yup, I'm pretty convinced this is a consistency and isolation problem.
>  Here's the output of my program, (expected 59 rows) and some output from
> problem rows.
>
> The row has 10 column families ("f0","f1"... "f9") with two cols.  There are
> actually two 'row0000021332' rows -- the 48 and the 49 versions.  Each came
> from a separate calls to next().  Notice how they are split between column
> families!   Over many runs this happens many times although "splitting"
> between different column families.
>
> scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows
> scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows
> scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows
> scan items counted: 60 with 0 null ColumnFamilyMaps and 0 nullRows
> Row row0000021332 had time stamps: [48:
> keyvalues={row0000021332/f0:data/1318200437341/Put/vlen=1000,
> row0000021332/f0:qual/1318200437341/Put/vlen=10,
> row0000021332/f1:data/1318200437341/Put/vlen=1000,
> row0000021332/f1:qual/1318200437341/Put/vlen=10,
> row0000021332/f2:data/1318200437341/Put/vlen=1000,
> row0000021332/f2:qual/1318200437341/Put/vlen=10,
> row0000021332/f3:data/1318200437341/Put/vlen=1000,
> row0000021332/f3:qual/1318200437341/Put/vlen=10,
> row0000021332/f4:data/1318200437341/Put/vlen=1000,
> row0000021332/f4:qual/1318200437341/Put/vlen=10,
> row0000021332/f5:data/1318200437341/Put/vlen=1000,
> row0000021332/f5:qual/1318200437341/Put/vlen=10}, 49:
> keyvalues={row0000021332/f6:data/1318200437341/Put/vlen=1000,
> row0000021332/f6:qual/1318200437341/Put/vlen=10,
> row0000021332/f7:data/1318200437341/Put/vlen=1000,
> row0000021332/f7:qual/1318200437341/Put/vlen=10,
> row0000021332/f8:data/1318200437341/Put/vlen=1000,
> row0000021332/f8:qual/1318200437341/Put/vlen=10,
> row0000021332/f9:data/1318200437341/Put/vlen=1000,
> row0000021332/f9:qual/1318200437341/Put/vlen=10}]
>
> Jon.
>
> On Mon, Oct 10, 2011 at 1:50 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
>
>> I've modified scenario 2's filter to setFilterIfMissing(true) which I
>> missed in scenario 1 and now have an undercount like in that scenario. In
>> another experiement, I found that getting the latest version
>> Record.getColumnLatest(..)  would return null.
>>
>> This seems like a consistency+isolation violation of the expected row-acid
>> properties.
>>
>> ----
>> Scan s = new Scan();
>> ...
>> SingleColumnValueFilter filter = new SingleColumnValueFilter(
>>         Bytes.toBytes("f1"), Bytes.toBytes("qual"), CompareOp.EQUAL,
>> value);
>>     filter.setFilterIfMissing(true); // this is what drops events.
>>     s.setFilter(filter);
>>  ---
>>
>> Jon.
>>
>> On Mon, Oct 10, 2011 at 12:52 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
>>
>>> I've working a problem related to ACID rules on scans as defined here
>>> http://hbase.apache.org/acid-semantics.html.
>>>
>>> In the two scenarios no new rowkeys are written -- all writes are
>>> "overwrites" of already existing rows.  So lets say the table has 3 rows:
>>> A1, B1, C1; and overwrites rows with values A2,B2,C2 respectively.   I'd
>>> expect all the scans would have the same number of elements (3) but with
>>> potentially any combination of A1|A2, B1|B2, C1|C2.  (since no acid
>>> guarantees across rows).
>>>
>>> Scenario 1: I have an MR job that does a filtered scan (confirmed this
>>> problem happens without filter as well)  from a table, takes the row and
>>> writes back to the same row and the same table.   I then run the job 2-3x
>>> concurrently on the same table with the same filter.  I believe both should
>>> return the same number of elements read.  It seems that in the case of a
>>> multiple column family table, this is not the case -- in particular
>>> sometimes MR counters reports that it had *fewer* than than the expected
>>> number of input records (but never more).  This seems wrong.  Agree?
>>>
>>> Scenario 2: When  trying to duplicate the problem excluding the MR
>>> portions, I wrote two programs -- one that does a filtered scan and that
>>> overwrites of existing rows, and one that just does the same filtered scan
>>> that just counts the number rows read.   I've actually also used the
>>> TableRecordReaderImpl code that the MR TableInputFormat uses.  In this case,
>>> sometimes the scan/counting job actually returns *too many* entries -- for
>>> some rowkeys return two records.  (but never fewer).  This should probably
>>> not  happen as well.  Agree?
>>>
>>> I've observed this on 0.90.1+patches and 0.90.3+patches.  It was also
>>> claimed that this issue was not seen in a 0.89.x based hbase.  Thus far I've
>>> been able to reproduce this on  multiple-column-family tables.
>>>
>>> I believe scenario 1's problem is related to these.  Concur?
>>> https://issues.apache.org/jira/browse/HBASE-2856
>>>  https://issues.apache.org/jira/browse/HBASE-3498
>>>
>>> I think scenario 2 is probably related but not sure if it is the same
>>> issue.
>>>
>>> Are there other related JIRA's?
>>>
>>> Any hints for where to hunt this down? (I'm starting to go into the scan
>>> and write paths on the RegionServer, but this is a fairly large area...
>>>
>>> Thanks
>>> Jon.
>>>
>>> --
>>> // Jonathan Hsieh (shay)
>>> // Software Engineer, Cloudera
>>> // jon@cloudera.com
>>>
>>>
>>>
>>
>>
>> --
>> // Jonathan Hsieh (shay)
>> // Software Engineer, Cloudera
>> // jon@cloudera.com
>>
>>
>>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>

Re: HBase scanner semantics and inconsistencies.

Posted by Jonathan Hsieh <jo...@cloudera.com>.

Yup, I'm pretty convinced this is a consistency and isolation problem.
 Here's the output of my program, (expected 59 rows) and some output from
problem rows.

The row has 10 column families ("f0","f1"... "f9") with two cols.  There are
actually two 'row0000021332' rows -- the 48 and the 49 versions.  Each came
from a separate calls to next().  Notice how they are split between column
families!   Over many runs this happens many times although "splitting"
between different column families.

scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows
scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows
scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows
scan items counted: 60 with 0 null ColumnFamilyMaps and 0 nullRows
Row row0000021332 had time stamps: [48:
keyvalues={row0000021332/f0:data/1318200437341/Put/vlen=1000,
row0000021332/f0:qual/1318200437341/Put/vlen=10,
row0000021332/f1:data/1318200437341/Put/vlen=1000,
row0000021332/f1:qual/1318200437341/Put/vlen=10,
row0000021332/f2:data/1318200437341/Put/vlen=1000,
row0000021332/f2:qual/1318200437341/Put/vlen=10,
row0000021332/f3:data/1318200437341/Put/vlen=1000,
row0000021332/f3:qual/1318200437341/Put/vlen=10,
row0000021332/f4:data/1318200437341/Put/vlen=1000,
row0000021332/f4:qual/1318200437341/Put/vlen=10,
row0000021332/f5:data/1318200437341/Put/vlen=1000,
row0000021332/f5:qual/1318200437341/Put/vlen=10}, 49:
keyvalues={row0000021332/f6:data/1318200437341/Put/vlen=1000,
row0000021332/f6:qual/1318200437341/Put/vlen=10,
row0000021332/f7:data/1318200437341/Put/vlen=1000,
row0000021332/f7:qual/1318200437341/Put/vlen=10,
row0000021332/f8:data/1318200437341/Put/vlen=1000,
row0000021332/f8:qual/1318200437341/Put/vlen=10,
row0000021332/f9:data/1318200437341/Put/vlen=1000,
row0000021332/f9:qual/1318200437341/Put/vlen=10}]

Jon.

On Mon, Oct 10, 2011 at 1:50 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> I've modified scenario 2's filter to setFilterIfMissing(true) which I
> missed in scenario 1 and now have an undercount like in that scenario. In
> another experiement, I found that getting the latest version
> Record.getColumnLatest(..)  would return null.
>
> This seems like a consistency+isolation violation of the expected row-acid
> properties.
>
> ----
> Scan s = new Scan();
> ...
> SingleColumnValueFilter filter = new SingleColumnValueFilter(
>         Bytes.toBytes("f1"), Bytes.toBytes("qual"), CompareOp.EQUAL,
> value);
>     filter.setFilterIfMissing(true); // this is what drops events.
>     s.setFilter(filter);
>  ---
>
> Jon.
>
> On Mon, Oct 10, 2011 at 12:52 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
>
>> I've working a problem related to ACID rules on scans as defined here
>> http://hbase.apache.org/acid-semantics.html.
>>
>> In the two scenarios no new rowkeys are written -- all writes are
>> "overwrites" of already existing rows.  So lets say the table has 3 rows:
>> A1, B1, C1; and overwrites rows with values A2,B2,C2 respectively.   I'd
>> expect all the scans would have the same number of elements (3) but with
>> potentially any combination of A1|A2, B1|B2, C1|C2.  (since no acid
>> guarantees across rows).
>>
>> Scenario 1: I have an MR job that does a filtered scan (confirmed this
>> problem happens without filter as well)  from a table, takes the row and
>> writes back to the same row and the same table.   I then run the job 2-3x
>> concurrently on the same table with the same filter.  I believe both should
>> return the same number of elements read.  It seems that in the case of a
>> multiple column family table, this is not the case -- in particular
>> sometimes MR counters reports that it had *fewer* than than the expected
>> number of input records (but never more).  This seems wrong.  Agree?
>>
>> Scenario 2: When  trying to duplicate the problem excluding the MR
>> portions, I wrote two programs -- one that does a filtered scan and that
>> overwrites of existing rows, and one that just does the same filtered scan
>> that just counts the number rows read.   I've actually also used the
>> TableRecordReaderImpl code that the MR TableInputFormat uses.  In this case,
>> sometimes the scan/counting job actually returns *too many* entries -- for
>> some rowkeys return two records.  (but never fewer).  This should probably
>> not  happen as well.  Agree?
>>
>> I've observed this on 0.90.1+patches and 0.90.3+patches.  It was also
>> claimed that this issue was not seen in a 0.89.x based hbase.  Thus far I've
>> been able to reproduce this on  multiple-column-family tables.
>>
>> I believe scenario 1's problem is related to these.  Concur?
>> https://issues.apache.org/jira/browse/HBASE-2856
>>  https://issues.apache.org/jira/browse/HBASE-3498
>>
>> I think scenario 2 is probably related but not sure if it is the same
>> issue.
>>
>> Are there other related JIRA's?
>>
>> Any hints for where to hunt this down? (I'm starting to go into the scan
>> and write paths on the RegionServer, but this is a fairly large area...
>>
>> Thanks
>> Jon.
>>
>> --
>> // Jonathan Hsieh (shay)
>> // Software Engineer, Cloudera
>> // jon@cloudera.com
>>
>>
>>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>
>
>


-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: HBase scanner semantics and inconsistencies.

Posted by Jonathan Hsieh <jo...@cloudera.com>.

Yup, I'm pretty convinced this is a consistency and isolation problem.
 Here's the output of my program, (expected 59 rows) and some output from
problem rows.

The row has 10 column families ("f0","f1"... "f9") with two cols.  There are
actually two 'row0000021332' rows -- the 48 and the 49 versions.  Each came
from a separate calls to next().  Notice how they are split between column
families!   Over many runs this happens many times although "splitting"
between different column families.

scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows
scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows
scan items counted: 59 with 0 null ColumnFamilyMaps and 0 nullRows
scan items counted: 60 with 0 null ColumnFamilyMaps and 0 nullRows
Row row0000021332 had time stamps: [48:
keyvalues={row0000021332/f0:data/1318200437341/Put/vlen=1000,
row0000021332/f0:qual/1318200437341/Put/vlen=10,
row0000021332/f1:data/1318200437341/Put/vlen=1000,
row0000021332/f1:qual/1318200437341/Put/vlen=10,
row0000021332/f2:data/1318200437341/Put/vlen=1000,
row0000021332/f2:qual/1318200437341/Put/vlen=10,
row0000021332/f3:data/1318200437341/Put/vlen=1000,
row0000021332/f3:qual/1318200437341/Put/vlen=10,
row0000021332/f4:data/1318200437341/Put/vlen=1000,
row0000021332/f4:qual/1318200437341/Put/vlen=10,
row0000021332/f5:data/1318200437341/Put/vlen=1000,
row0000021332/f5:qual/1318200437341/Put/vlen=10}, 49:
keyvalues={row0000021332/f6:data/1318200437341/Put/vlen=1000,
row0000021332/f6:qual/1318200437341/Put/vlen=10,
row0000021332/f7:data/1318200437341/Put/vlen=1000,
row0000021332/f7:qual/1318200437341/Put/vlen=10,
row0000021332/f8:data/1318200437341/Put/vlen=1000,
row0000021332/f8:qual/1318200437341/Put/vlen=10,
row0000021332/f9:data/1318200437341/Put/vlen=1000,
row0000021332/f9:qual/1318200437341/Put/vlen=10}]

Jon.

On Mon, Oct 10, 2011 at 1:50 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> I've modified scenario 2's filter to setFilterIfMissing(true) which I
> missed in scenario 1 and now have an undercount like in that scenario. In
> another experiement, I found that getting the latest version
> Record.getColumnLatest(..)  would return null.
>
> This seems like a consistency+isolation violation of the expected row-acid
> properties.
>
> ----
> Scan s = new Scan();
> ...
> SingleColumnValueFilter filter = new SingleColumnValueFilter(
>         Bytes.toBytes("f1"), Bytes.toBytes("qual"), CompareOp.EQUAL,
> value);
>     filter.setFilterIfMissing(true); // this is what drops events.
>     s.setFilter(filter);
>  ---
>
> Jon.
>
> On Mon, Oct 10, 2011 at 12:52 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:
>
>> I've working a problem related to ACID rules on scans as defined here
>> http://hbase.apache.org/acid-semantics.html.
>>
>> In the two scenarios no new rowkeys are written -- all writes are
>> "overwrites" of already existing rows.  So lets say the table has 3 rows:
>> A1, B1, C1; and overwrites rows with values A2,B2,C2 respectively.   I'd
>> expect all the scans would have the same number of elements (3) but with
>> potentially any combination of A1|A2, B1|B2, C1|C2.  (since no acid
>> guarantees across rows).
>>
>> Scenario 1: I have an MR job that does a filtered scan (confirmed this
>> problem happens without filter as well)  from a table, takes the row and
>> writes back to the same row and the same table.   I then run the job 2-3x
>> concurrently on the same table with the same filter.  I believe both should
>> return the same number of elements read.  It seems that in the case of a
>> multiple column family table, this is not the case -- in particular
>> sometimes MR counters reports that it had *fewer* than than the expected
>> number of input records (but never more).  This seems wrong.  Agree?
>>
>> Scenario 2: When  trying to duplicate the problem excluding the MR
>> portions, I wrote two programs -- one that does a filtered scan and that
>> overwrites of existing rows, and one that just does the same filtered scan
>> that just counts the number rows read.   I've actually also used the
>> TableRecordReaderImpl code that the MR TableInputFormat uses.  In this case,
>> sometimes the scan/counting job actually returns *too many* entries -- for
>> some rowkeys return two records.  (but never fewer).  This should probably
>> not  happen as well.  Agree?
>>
>> I've observed this on 0.90.1+patches and 0.90.3+patches.  It was also
>> claimed that this issue was not seen in a 0.89.x based hbase.  Thus far I've
>> been able to reproduce this on  multiple-column-family tables.
>>
>> I believe scenario 1's problem is related to these.  Concur?
>> https://issues.apache.org/jira/browse/HBASE-2856
>>  https://issues.apache.org/jira/browse/HBASE-3498
>>
>> I think scenario 2 is probably related but not sure if it is the same
>> issue.
>>
>> Are there other related JIRA's?
>>
>> Any hints for where to hunt this down? (I'm starting to go into the scan
>> and write paths on the RegionServer, but this is a fairly large area...
>>
>> Thanks
>> Jon.
>>
>> --
>> // Jonathan Hsieh (shay)
>> // Software Engineer, Cloudera
>> // jon@cloudera.com
>>
>>
>>
>
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>
>
>


-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: HBase scanner semantics and inconsistencies.

Posted by Jonathan Hsieh <jo...@cloudera.com>.

I've modified scenario 2's filter to setFilterIfMissing(true) which I missed
in scenario 1 and now have an undercount like in that scenario. In another
experiement, I found that getting the latest version
Record.getColumnLatest(..)  would return null.

This seems like a consistency+isolation violation of the expected row-acid
properties.

----
Scan s = new Scan();
...
SingleColumnValueFilter filter = new SingleColumnValueFilter(
        Bytes.toBytes("f1"), Bytes.toBytes("qual"), CompareOp.EQUAL, value);
    filter.setFilterIfMissing(true); // this is what drops events.
    s.setFilter(filter);
 ---

Jon.

On Mon, Oct 10, 2011 at 12:52 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> I've working a problem related to ACID rules on scans as defined here
> http://hbase.apache.org/acid-semantics.html.
>
> In the two scenarios no new rowkeys are written -- all writes are
> "overwrites" of already existing rows.  So lets say the table has 3 rows:
> A1, B1, C1; and overwrites rows with values A2,B2,C2 respectively.   I'd
> expect all the scans would have the same number of elements (3) but with
> potentially any combination of A1|A2, B1|B2, C1|C2.  (since no acid
> guarantees across rows).
>
> Scenario 1: I have an MR job that does a filtered scan (confirmed this
> problem happens without filter as well)  from a table, takes the row and
> writes back to the same row and the same table.   I then run the job 2-3x
> concurrently on the same table with the same filter.  I believe both should
> return the same number of elements read.  It seems that in the case of a
> multiple column family table, this is not the case -- in particular
> sometimes MR counters reports that it had *fewer* than than the expected
> number of input records (but never more).  This seems wrong.  Agree?
>
> Scenario 2: When  trying to duplicate the problem excluding the MR
> portions, I wrote two programs -- one that does a filtered scan and that
> overwrites of existing rows, and one that just does the same filtered scan
> that just counts the number rows read.   I've actually also used the
> TableRecordReaderImpl code that the MR TableInputFormat uses.  In this case,
> sometimes the scan/counting job actually returns *too many* entries -- for
> some rowkeys return two records.  (but never fewer).  This should probably
> not  happen as well.  Agree?
>
> I've observed this on 0.90.1+patches and 0.90.3+patches.  It was also
> claimed that this issue was not seen in a 0.89.x based hbase.  Thus far I've
> been able to reproduce this on  multiple-column-family tables.
>
> I believe scenario 1's problem is related to these.  Concur?
> https://issues.apache.org/jira/browse/HBASE-2856
>  https://issues.apache.org/jira/browse/HBASE-3498
>
> I think scenario 2 is probably related but not sure if it is the same
> issue.
>
> Are there other related JIRA's?
>
> Any hints for where to hunt this down? (I'm starting to go into the scan
> and write paths on the RegionServer, but this is a fairly large area...
>
> Thanks
> Jon.
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>
>
>


-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: HBase scanner semantics and inconsistencies.

Posted by Jonathan Hsieh <jo...@cloudera.com>.

I've modified scenario 2's filter to setFilterIfMissing(true) which I missed
in scenario 1 and now have an undercount like in that scenario. In another
experiement, I found that getting the latest version
Record.getColumnLatest(..)  would return null.

This seems like a consistency+isolation violation of the expected row-acid
properties.

----
Scan s = new Scan();
...
SingleColumnValueFilter filter = new SingleColumnValueFilter(
        Bytes.toBytes("f1"), Bytes.toBytes("qual"), CompareOp.EQUAL, value);
    filter.setFilterIfMissing(true); // this is what drops events.
    s.setFilter(filter);
 ---

Jon.

On Mon, Oct 10, 2011 at 12:52 PM, Jonathan Hsieh <jo...@cloudera.com> wrote:

> I've working a problem related to ACID rules on scans as defined here
> http://hbase.apache.org/acid-semantics.html.
>
> In the two scenarios no new rowkeys are written -- all writes are
> "overwrites" of already existing rows.  So lets say the table has 3 rows:
> A1, B1, C1; and overwrites rows with values A2,B2,C2 respectively.   I'd
> expect all the scans would have the same number of elements (3) but with
> potentially any combination of A1|A2, B1|B2, C1|C2.  (since no acid
> guarantees across rows).
>
> Scenario 1: I have an MR job that does a filtered scan (confirmed this
> problem happens without filter as well)  from a table, takes the row and
> writes back to the same row and the same table.   I then run the job 2-3x
> concurrently on the same table with the same filter.  I believe both should
> return the same number of elements read.  It seems that in the case of a
> multiple column family table, this is not the case -- in particular
> sometimes MR counters reports that it had *fewer* than than the expected
> number of input records (but never more).  This seems wrong.  Agree?
>
> Scenario 2: When  trying to duplicate the problem excluding the MR
> portions, I wrote two programs -- one that does a filtered scan and that
> overwrites of existing rows, and one that just does the same filtered scan
> that just counts the number rows read.   I've actually also used the
> TableRecordReaderImpl code that the MR TableInputFormat uses.  In this case,
> sometimes the scan/counting job actually returns *too many* entries -- for
> some rowkeys return two records.  (but never fewer).  This should probably
> not  happen as well.  Agree?
>
> I've observed this on 0.90.1+patches and 0.90.3+patches.  It was also
> claimed that this issue was not seen in a 0.89.x based hbase.  Thus far I've
> been able to reproduce this on  multiple-column-family tables.
>
> I believe scenario 1's problem is related to these.  Concur?
> https://issues.apache.org/jira/browse/HBASE-2856
>  https://issues.apache.org/jira/browse/HBASE-3498
>
> I think scenario 2 is probably related but not sure if it is the same
> issue.
>
> Are there other related JIRA's?
>
> Any hints for where to hunt this down? (I'm starting to go into the scan
> and write paths on the RegionServer, but this is a fairly large area...
>
> Thanks
> Jon.
>
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
>
>
>


-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com