You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ishan Chhabra <ic...@rocketfuel.com> on 2014/01/15 00:26:11 UTC

Interaction of SequenceID and timestamps

I am trying to understand the interaction of sequenceId and timestamps for
KVs, and what was the real issue behind
https://issues.apache.org/jira/browse/HBASE-6590 which says that bulkload
can be used only to update only historical data and not current data.

Taking an example:

Lets say I have a KV (r, c, val1, 10), where 10 is the timestamp already in
HBase.
Now, if I bulkload a KV (r, c, val2, 20) without the patch, will that be
sorted behind the previous KV since the file created has a sequenceID 0, or
will it be correctly be the new returned value during a scan for the (r,c).

I conducted some experiments myself and concluded that timestamp has a
priority over sequenceId and sequenceId is used to break a tie only when
the timestamp is the same, but I need to make sure that my understanding is
correct.

Thanks!

-- 
*Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.

Re: Interaction of SequenceID and timestamps

Posted by Ishan Chhabra <ic...@rocketfuel.com>.
Thanks for pointing out the code. My understanding is correct.

Thanks!


On Tue, Jan 14, 2014 at 3:40 PM, Ted Yu <yu...@gmail.com> wrote:

> Please take a look at the following method in
> KeyValueHeap#KVScannerComparator
> :
>
>     public int compare(KeyValueScanner left, KeyValueScanner right) {
>
> Cheers
>
>
> On Tue, Jan 14, 2014 at 3:26 PM, Ishan Chhabra <ichhabra@rocketfuel.com
> >wrote:
>
> > I am trying to understand the interaction of sequenceId and timestamps
> for
> > KVs, and what was the real issue behind
> > https://issues.apache.org/jira/browse/HBASE-6590 which says that
> bulkload
> > can be used only to update only historical data and not current data.
> >
> > Taking an example:
> >
> > Lets say I have a KV (r, c, val1, 10), where 10 is the timestamp already
> in
> > HBase.
> > Now, if I bulkload a KV (r, c, val2, 20) without the patch, will that be
> > sorted behind the previous KV since the file created has a sequenceID 0,
> or
> > will it be correctly be the new returned value during a scan for the
> (r,c).
> >
> > I conducted some experiments myself and concluded that timestamp has a
> > priority over sequenceId and sequenceId is used to break a tie only when
> > the timestamp is the same, but I need to make sure that my understanding
> is
> > correct.
> >
> > Thanks!
> >
> > --
> > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.
> >
>



-- 
*Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.

Re: Interaction of SequenceID and timestamps

Posted by Ted Yu <yu...@gmail.com>.
Please take a look at the following method in KeyValueHeap#KVScannerComparator
:

    public int compare(KeyValueScanner left, KeyValueScanner right) {

Cheers


On Tue, Jan 14, 2014 at 3:26 PM, Ishan Chhabra <ic...@rocketfuel.com>wrote:

> I am trying to understand the interaction of sequenceId and timestamps for
> KVs, and what was the real issue behind
> https://issues.apache.org/jira/browse/HBASE-6590 which says that bulkload
> can be used only to update only historical data and not current data.
>
> Taking an example:
>
> Lets say I have a KV (r, c, val1, 10), where 10 is the timestamp already in
> HBase.
> Now, if I bulkload a KV (r, c, val2, 20) without the patch, will that be
> sorted behind the previous KV since the file created has a sequenceID 0, or
> will it be correctly be the new returned value during a scan for the (r,c).
>
> I conducted some experiments myself and concluded that timestamp has a
> priority over sequenceId and sequenceId is used to break a tie only when
> the timestamp is the same, but I need to make sure that my understanding is
> correct.
>
> Thanks!
>
> --
> *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.
>