You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Xinan Wu <wu...@gmail.com> on 2009/06/16 21:48:57 UTC

random timestamp insert

I am aware that inserting data into hbase with random timestamp order
results indeterminate result.

e.g. comments here
https://issues.apache.org/jira/browse/HBASE-1249#action_12682369

I've personally experienced indeterminate results before when I insert
in random timestamp order (i.e., multiple versions with same timestamp
in the same cell, out-of-order timestamp when getting multiple
versions).

In other words, we don't want to go back in time in inserting cells.
Deletion is ok. But is updating pretty much the same story as
inserting?

i.e., if I make sure the timestamp does exist in the cell, and then I
_update_ it with that timestamp (and same value length), sometimes
hbase still just inserts a new version without touching the old one,
and of course timestamps of this cell becomes out of order. Even if I
delete all versions in that cell and reinsert in the time order, the
result is still out of order. I assume if I do a major compact between
delete all and reinsert, it would be ok, but that's not a good
solution. Is there any good way to update a version of a cell in the
past? or that simply won't work?

Thanks,

Re: random timestamp insert

Posted by Alexandre Jaquet <al...@gmail.com>.

Sorry If we have a conflit .. and not if we are in non optimistic mode

2009/6/16 Alexandre Jaquet <al...@gmail.com>

>
> checkAndSave have to looks nice but
>
> optimistic concurrency control is based on the assumption that most database
> transactions <http://en.wikipedia.org/wiki/Database_transaction> don't
> conflict with other transactions
>
> In most case but what's happening if we are in a non optimistic mode ?
>
>
>
> 2009/6/16 Ryan Rawson <ry...@gmail.com>
>
>> The IPC threading can become an issue on a really busy server.  There is
>> by
>> default 10 IPC listener threads, once you have 10 concurrent operations
>> you
>> must wait for one to end to do the next one.  You can up this if it ends
>> up
>> becoming a problem.  It has to be bounded or else resource consumption
>> will
>> eventually crash.
>>
>> The only area this becomes a problem is explicit row locking - if you take
>> out a lock in one client, then a different client comes to get the same
>> lock, the second client has to wait, and while waiting it consumes a IPC
>> thread.
>>
>> But you shouldn't need to use explicit row locking.
>> - Mutations (puts, deletes) take out a row lock then release it.
>> - There is a checkAndSave() which allows you to get some kinds of
>> optimistic
>> concurrency
>> - you can use the multi-version mechanism to test for optimistic lock
>> failure
>> - atomicIncrement allows you to maintain sequences/counters without the
>> use
>> of locks.
>>
>> I would recommend from designing a schema/application that uses row locks.
>> Use one of the other excellent mechanisms provided.  If your needs are
>> really above and beyond those, lets talk in detail.  A column oriented
>> store
>> has all sorts of powerful things available to it that rdbms dont have.
>>
>> On Tue, Jun 16, 2009 at 1:22 PM, Alexandre Jaquet <alexjaquet@gmail.com
>> >wrote:
>>
>> > Thanks Ryan for your explanation,
>> >
>> > But as I understand IPC call genereate dead lock over consomation  of
>> > services ? What is the exact role of a region server ?
>> >
>> > Thanks again.
>> >
>> > 2009/6/16 Ryan Rawson <ry...@gmail.com>
>> >
>> > > Hey,
>> > >
>> > > So the issue there was when you are using the row-lock support built
>> in,
>> > > the
>> > > waiters for a row lock use up a IPC responder thread. There is only so
>> > many
>> > > of them. Then your clients start failing as regionservers are busy
>> > waiting
>> > > for locks to be released.
>> > >
>> > > The suggestion there was to use zookeeper-based locks.  The suggestion
>> is
>> > > still valid.
>> > >
>> > > I don't get your question about if timestamp is better than "Long
>> > > versioning".  A timestamp is a long - it's default value is
>> > > System.currentTimeMillis(), thus it's the milliseconds since epoch
>> 1970 -
>> > a
>> > > slight variation on the time_t.
>> > >
>> > > Generally I would recommend people avoid setting timestamps unless
>> they
>> > > have
>> > > special needs.  Timestamps order multiple version for a given
>> row/column,
>> > > thus if you 'mess it up', you get wrong data returned.
>> > >
>> > > I personally believe that timestamps are not necessairly the best way
>> to
>> > > store time-series data.  While in 0.20 we have better query mechanisms
>> > (all
>> > > values between X and Y is the general mechanism), you can probably do
>> > > better
>> > > with indexes.
>> > >
>> > > -ryan
>> > >
>> > > On Tue, Jun 16, 2009 at 1:04 PM, Alexandre Jaquet <
>> alexjaquet@gmail.com
>> > > >wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > I'm also evaluting hbase for some applications and found an old post
>> > > about
>> > > > transactions and concurrent access
>> > > >
>> > > > http://osdir.com/ml/java.hadoop.hbase.user/2008-05/msg00169.html
>> > > >
>> > > > Does timestamp is really better than Long versioning ?
>> > > >
>> > > > Any workaround ?
>> > > >
>> > > > 2009/6/16 Xinan Wu <wu...@gmail.com>
>> > > >
>> > > > > I am aware that inserting data into hbase with random timestamp
>> order
>> > > > > results indeterminate result.
>> > > > >
>> > > > > e.g. comments here
>> > > > > https://issues.apache.org/jira/browse/HBASE-1249#action_12682369
>> > > > >
>> > > > > I've personally experienced indeterminate results before when I
>> > insert
>> > > > > in random timestamp order (i.e., multiple versions with same
>> > timestamp
>> > > > > in the same cell, out-of-order timestamp when getting multiple
>> > > > > versions).
>> > > > >
>> > > > > In other words, we don't want to go back in time in inserting
>> cells.
>> > > > > Deletion is ok. But is updating pretty much the same story as
>> > > > > inserting?
>> > > > >
>> > > > > i.e., if I make sure the timestamp does exist in the cell, and
>> then I
>> > > > > _update_ it with that timestamp (and same value length), sometimes
>> > > > > hbase still just inserts a new version without touching the old
>> one,
>> > > > > and of course timestamps of this cell becomes out of order. Even
>> if I
>> > > > > delete all versions in that cell and reinsert in the time order,
>> the
>> > > > > result is still out of order. I assume if I do a major compact
>> > between
>> > > > > delete all and reinsert, it would be ok, but that's not a good
>> > > > > solution. Is there any good way to update a version of a cell in
>> the
>> > > > > past? or that simply won't work?
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: random timestamp insert

Posted by Ryan Rawson <ry...@gmail.com>.

The opposite of optimistic locking is 'pessimistic locking' which means
'explicit locks'.  When you are expecting the # of concurrent writes to the
same row to be low, optimistic locking is vastly superior performance.

Generally you can use optimistic locking in nearly all cases.  Even huge
ordering systems use it.  I'm sure you can make it fit your application
needs.

Also remember that HBase is not really a transactional DB, you get row locks
and atomic updates on rows, but that is about it.

-ryan

On Tue, Jun 16, 2009 at 1:51 PM, Alexandre Jaquet <al...@gmail.com>wrote:

> checkAndSave have to looks nice but
>
> optimistic concurrency control is based on the assumption that most
> database
> transactions <http://en.wikipedia.org/wiki/Database_transaction> don't
> conflict with other transactions
>
> In most case but what's happening if we are in a non optimistic mode ?
>
>
> 2009/6/16 Ryan Rawson <ry...@gmail.com>
>
> > The IPC threading can become an issue on a really busy server.  There is
> by
> > default 10 IPC listener threads, once you have 10 concurrent operations
> you
> > must wait for one to end to do the next one.  You can up this if it ends
> up
> > becoming a problem.  It has to be bounded or else resource consumption
> will
> > eventually crash.
> >
> > The only area this becomes a problem is explicit row locking - if you
> take
> > out a lock in one client, then a different client comes to get the same
> > lock, the second client has to wait, and while waiting it consumes a IPC
> > thread.
> >
> > But you shouldn't need to use explicit row locking.
> > - Mutations (puts, deletes) take out a row lock then release it.
> > - There is a checkAndSave() which allows you to get some kinds of
> > optimistic
> > concurrency
> > - you can use the multi-version mechanism to test for optimistic lock
> > failure
> > - atomicIncrement allows you to maintain sequences/counters without the
> use
> > of locks.
> >
> > I would recommend from designing a schema/application that uses row
> locks.
> > Use one of the other excellent mechanisms provided.  If your needs are
> > really above and beyond those, lets talk in detail.  A column oriented
> > store
> > has all sorts of powerful things available to it that rdbms dont have.
> >
> > On Tue, Jun 16, 2009 at 1:22 PM, Alexandre Jaquet <alexjaquet@gmail.com
> > >wrote:
> >
> > > Thanks Ryan for your explanation,
> > >
> > > But as I understand IPC call genereate dead lock over consomation  of
> > > services ? What is the exact role of a region server ?
> > >
> > > Thanks again.
> > >
> > > 2009/6/16 Ryan Rawson <ry...@gmail.com>
> > >
> > > > Hey,
> > > >
> > > > So the issue there was when you are using the row-lock support built
> > in,
> > > > the
> > > > waiters for a row lock use up a IPC responder thread. There is only
> so
> > > many
> > > > of them. Then your clients start failing as regionservers are busy
> > > waiting
> > > > for locks to be released.
> > > >
> > > > The suggestion there was to use zookeeper-based locks.  The
> suggestion
> > is
> > > > still valid.
> > > >
> > > > I don't get your question about if timestamp is better than "Long
> > > > versioning".  A timestamp is a long - it's default value is
> > > > System.currentTimeMillis(), thus it's the milliseconds since epoch
> 1970
> > -
> > > a
> > > > slight variation on the time_t.
> > > >
> > > > Generally I would recommend people avoid setting timestamps unless
> they
> > > > have
> > > > special needs.  Timestamps order multiple version for a given
> > row/column,
> > > > thus if you 'mess it up', you get wrong data returned.
> > > >
> > > > I personally believe that timestamps are not necessairly the best way
> > to
> > > > store time-series data.  While in 0.20 we have better query
> mechanisms
> > > (all
> > > > values between X and Y is the general mechanism), you can probably do
> > > > better
> > > > with indexes.
> > > >
> > > > -ryan
> > > >
> > > > On Tue, Jun 16, 2009 at 1:04 PM, Alexandre Jaquet <
> > alexjaquet@gmail.com
> > > > >wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I'm also evaluting hbase for some applications and found an old
> post
> > > > about
> > > > > transactions and concurrent access
> > > > >
> > > > > http://osdir.com/ml/java.hadoop.hbase.user/2008-05/msg00169.html
> > > > >
> > > > > Does timestamp is really better than Long versioning ?
> > > > >
> > > > > Any workaround ?
> > > > >
> > > > > 2009/6/16 Xinan Wu <wu...@gmail.com>
> > > > >
> > > > > > I am aware that inserting data into hbase with random timestamp
> > order
> > > > > > results indeterminate result.
> > > > > >
> > > > > > e.g. comments here
> > > > > > https://issues.apache.org/jira/browse/HBASE-1249#action_12682369
> > > > > >
> > > > > > I've personally experienced indeterminate results before when I
> > > insert
> > > > > > in random timestamp order (i.e., multiple versions with same
> > > timestamp
> > > > > > in the same cell, out-of-order timestamp when getting multiple
> > > > > > versions).
> > > > > >
> > > > > > In other words, we don't want to go back in time in inserting
> > cells.
> > > > > > Deletion is ok. But is updating pretty much the same story as
> > > > > > inserting?
> > > > > >
> > > > > > i.e., if I make sure the timestamp does exist in the cell, and
> then
> > I
> > > > > > _update_ it with that timestamp (and same value length),
> sometimes
> > > > > > hbase still just inserts a new version without touching the old
> > one,
> > > > > > and of course timestamps of this cell becomes out of order. Even
> if
> > I
> > > > > > delete all versions in that cell and reinsert in the time order,
> > the
> > > > > > result is still out of order. I assume if I do a major compact
> > > between
> > > > > > delete all and reinsert, it would be ok, but that's not a good
> > > > > > solution. Is there any good way to update a version of a cell in
> > the
> > > > > > past? or that simply won't work?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: random timestamp insert

Posted by Alexandre Jaquet <al...@gmail.com>.

checkAndSave have to looks nice but

optimistic concurrency control is based on the assumption that most database
transactions <http://en.wikipedia.org/wiki/Database_transaction> don't
conflict with other transactions

In most case but what's happening if we are in a non optimistic mode ?


2009/6/16 Ryan Rawson <ry...@gmail.com>

> The IPC threading can become an issue on a really busy server.  There is by
> default 10 IPC listener threads, once you have 10 concurrent operations you
> must wait for one to end to do the next one.  You can up this if it ends up
> becoming a problem.  It has to be bounded or else resource consumption will
> eventually crash.
>
> The only area this becomes a problem is explicit row locking - if you take
> out a lock in one client, then a different client comes to get the same
> lock, the second client has to wait, and while waiting it consumes a IPC
> thread.
>
> But you shouldn't need to use explicit row locking.
> - Mutations (puts, deletes) take out a row lock then release it.
> - There is a checkAndSave() which allows you to get some kinds of
> optimistic
> concurrency
> - you can use the multi-version mechanism to test for optimistic lock
> failure
> - atomicIncrement allows you to maintain sequences/counters without the use
> of locks.
>
> I would recommend from designing a schema/application that uses row locks.
> Use one of the other excellent mechanisms provided.  If your needs are
> really above and beyond those, lets talk in detail.  A column oriented
> store
> has all sorts of powerful things available to it that rdbms dont have.
>
> On Tue, Jun 16, 2009 at 1:22 PM, Alexandre Jaquet <alexjaquet@gmail.com
> >wrote:
>
> > Thanks Ryan for your explanation,
> >
> > But as I understand IPC call genereate dead lock over consomation  of
> > services ? What is the exact role of a region server ?
> >
> > Thanks again.
> >
> > 2009/6/16 Ryan Rawson <ry...@gmail.com>
> >
> > > Hey,
> > >
> > > So the issue there was when you are using the row-lock support built
> in,
> > > the
> > > waiters for a row lock use up a IPC responder thread. There is only so
> > many
> > > of them. Then your clients start failing as regionservers are busy
> > waiting
> > > for locks to be released.
> > >
> > > The suggestion there was to use zookeeper-based locks.  The suggestion
> is
> > > still valid.
> > >
> > > I don't get your question about if timestamp is better than "Long
> > > versioning".  A timestamp is a long - it's default value is
> > > System.currentTimeMillis(), thus it's the milliseconds since epoch 1970
> -
> > a
> > > slight variation on the time_t.
> > >
> > > Generally I would recommend people avoid setting timestamps unless they
> > > have
> > > special needs.  Timestamps order multiple version for a given
> row/column,
> > > thus if you 'mess it up', you get wrong data returned.
> > >
> > > I personally believe that timestamps are not necessairly the best way
> to
> > > store time-series data.  While in 0.20 we have better query mechanisms
> > (all
> > > values between X and Y is the general mechanism), you can probably do
> > > better
> > > with indexes.
> > >
> > > -ryan
> > >
> > > On Tue, Jun 16, 2009 at 1:04 PM, Alexandre Jaquet <
> alexjaquet@gmail.com
> > > >wrote:
> > >
> > > > Hello,
> > > >
> > > > I'm also evaluting hbase for some applications and found an old post
> > > about
> > > > transactions and concurrent access
> > > >
> > > > http://osdir.com/ml/java.hadoop.hbase.user/2008-05/msg00169.html
> > > >
> > > > Does timestamp is really better than Long versioning ?
> > > >
> > > > Any workaround ?
> > > >
> > > > 2009/6/16 Xinan Wu <wu...@gmail.com>
> > > >
> > > > > I am aware that inserting data into hbase with random timestamp
> order
> > > > > results indeterminate result.
> > > > >
> > > > > e.g. comments here
> > > > > https://issues.apache.org/jira/browse/HBASE-1249#action_12682369
> > > > >
> > > > > I've personally experienced indeterminate results before when I
> > insert
> > > > > in random timestamp order (i.e., multiple versions with same
> > timestamp
> > > > > in the same cell, out-of-order timestamp when getting multiple
> > > > > versions).
> > > > >
> > > > > In other words, we don't want to go back in time in inserting
> cells.
> > > > > Deletion is ok. But is updating pretty much the same story as
> > > > > inserting?
> > > > >
> > > > > i.e., if I make sure the timestamp does exist in the cell, and then
> I
> > > > > _update_ it with that timestamp (and same value length), sometimes
> > > > > hbase still just inserts a new version without touching the old
> one,
> > > > > and of course timestamps of this cell becomes out of order. Even if
> I
> > > > > delete all versions in that cell and reinsert in the time order,
> the
> > > > > result is still out of order. I assume if I do a major compact
> > between
> > > > > delete all and reinsert, it would be ok, but that's not a good
> > > > > solution. Is there any good way to update a version of a cell in
> the
> > > > > past? or that simply won't work?
> > > > >
> > > > > Thanks,
> > > > >
> > > >
> > >
> >
>

Re: random timestamp insert

Posted by Ryan Rawson <ry...@gmail.com>.

The IPC threading can become an issue on a really busy server.  There is by
default 10 IPC listener threads, once you have 10 concurrent operations you
must wait for one to end to do the next one.  You can up this if it ends up
becoming a problem.  It has to be bounded or else resource consumption will
eventually crash.

The only area this becomes a problem is explicit row locking - if you take
out a lock in one client, then a different client comes to get the same
lock, the second client has to wait, and while waiting it consumes a IPC
thread.

But you shouldn't need to use explicit row locking.
- Mutations (puts, deletes) take out a row lock then release it.
- There is a checkAndSave() which allows you to get some kinds of optimistic
concurrency
- you can use the multi-version mechanism to test for optimistic lock
failure
- atomicIncrement allows you to maintain sequences/counters without the use
of locks.

I would recommend from designing a schema/application that uses row locks.
Use one of the other excellent mechanisms provided.  If your needs are
really above and beyond those, lets talk in detail.  A column oriented store
has all sorts of powerful things available to it that rdbms dont have.

On Tue, Jun 16, 2009 at 1:22 PM, Alexandre Jaquet <al...@gmail.com>wrote:

> Thanks Ryan for your explanation,
>
> But as I understand IPC call genereate dead lock over consomation  of
> services ? What is the exact role of a region server ?
>
> Thanks again.
>
> 2009/6/16 Ryan Rawson <ry...@gmail.com>
>
> > Hey,
> >
> > So the issue there was when you are using the row-lock support built in,
> > the
> > waiters for a row lock use up a IPC responder thread. There is only so
> many
> > of them. Then your clients start failing as regionservers are busy
> waiting
> > for locks to be released.
> >
> > The suggestion there was to use zookeeper-based locks.  The suggestion is
> > still valid.
> >
> > I don't get your question about if timestamp is better than "Long
> > versioning".  A timestamp is a long - it's default value is
> > System.currentTimeMillis(), thus it's the milliseconds since epoch 1970 -
> a
> > slight variation on the time_t.
> >
> > Generally I would recommend people avoid setting timestamps unless they
> > have
> > special needs.  Timestamps order multiple version for a given row/column,
> > thus if you 'mess it up', you get wrong data returned.
> >
> > I personally believe that timestamps are not necessairly the best way to
> > store time-series data.  While in 0.20 we have better query mechanisms
> (all
> > values between X and Y is the general mechanism), you can probably do
> > better
> > with indexes.
> >
> > -ryan
> >
> > On Tue, Jun 16, 2009 at 1:04 PM, Alexandre Jaquet <alexjaquet@gmail.com
> > >wrote:
> >
> > > Hello,
> > >
> > > I'm also evaluting hbase for some applications and found an old post
> > about
> > > transactions and concurrent access
> > >
> > > http://osdir.com/ml/java.hadoop.hbase.user/2008-05/msg00169.html
> > >
> > > Does timestamp is really better than Long versioning ?
> > >
> > > Any workaround ?
> > >
> > > 2009/6/16 Xinan Wu <wu...@gmail.com>
> > >
> > > > I am aware that inserting data into hbase with random timestamp order
> > > > results indeterminate result.
> > > >
> > > > e.g. comments here
> > > > https://issues.apache.org/jira/browse/HBASE-1249#action_12682369
> > > >
> > > > I've personally experienced indeterminate results before when I
> insert
> > > > in random timestamp order (i.e., multiple versions with same
> timestamp
> > > > in the same cell, out-of-order timestamp when getting multiple
> > > > versions).
> > > >
> > > > In other words, we don't want to go back in time in inserting cells.
> > > > Deletion is ok. But is updating pretty much the same story as
> > > > inserting?
> > > >
> > > > i.e., if I make sure the timestamp does exist in the cell, and then I
> > > > _update_ it with that timestamp (and same value length), sometimes
> > > > hbase still just inserts a new version without touching the old one,
> > > > and of course timestamps of this cell becomes out of order. Even if I
> > > > delete all versions in that cell and reinsert in the time order, the
> > > > result is still out of order. I assume if I do a major compact
> between
> > > > delete all and reinsert, it would be ok, but that's not a good
> > > > solution. Is there any good way to update a version of a cell in the
> > > > past? or that simply won't work?
> > > >
> > > > Thanks,
> > > >
> > >
> >
>

Re: random timestamp insert

Posted by Alexandre Jaquet <al...@gmail.com>.

Thanks Ryan for your explanation,

But as I understand IPC call genereate dead lock over consomation  of
services ? What is the exact role of a region server ?

Thanks again.

2009/6/16 Ryan Rawson <ry...@gmail.com>

> Hey,
>
> So the issue there was when you are using the row-lock support built in,
> the
> waiters for a row lock use up a IPC responder thread. There is only so many
> of them. Then your clients start failing as regionservers are busy waiting
> for locks to be released.
>
> The suggestion there was to use zookeeper-based locks.  The suggestion is
> still valid.
>
> I don't get your question about if timestamp is better than "Long
> versioning".  A timestamp is a long - it's default value is
> System.currentTimeMillis(), thus it's the milliseconds since epoch 1970 - a
> slight variation on the time_t.
>
> Generally I would recommend people avoid setting timestamps unless they
> have
> special needs.  Timestamps order multiple version for a given row/column,
> thus if you 'mess it up', you get wrong data returned.
>
> I personally believe that timestamps are not necessairly the best way to
> store time-series data.  While in 0.20 we have better query mechanisms (all
> values between X and Y is the general mechanism), you can probably do
> better
> with indexes.
>
> -ryan
>
> On Tue, Jun 16, 2009 at 1:04 PM, Alexandre Jaquet <alexjaquet@gmail.com
> >wrote:
>
> > Hello,
> >
> > I'm also evaluting hbase for some applications and found an old post
> about
> > transactions and concurrent access
> >
> > http://osdir.com/ml/java.hadoop.hbase.user/2008-05/msg00169.html
> >
> > Does timestamp is really better than Long versioning ?
> >
> > Any workaround ?
> >
> > 2009/6/16 Xinan Wu <wu...@gmail.com>
> >
> > > I am aware that inserting data into hbase with random timestamp order
> > > results indeterminate result.
> > >
> > > e.g. comments here
> > > https://issues.apache.org/jira/browse/HBASE-1249#action_12682369
> > >
> > > I've personally experienced indeterminate results before when I insert
> > > in random timestamp order (i.e., multiple versions with same timestamp
> > > in the same cell, out-of-order timestamp when getting multiple
> > > versions).
> > >
> > > In other words, we don't want to go back in time in inserting cells.
> > > Deletion is ok. But is updating pretty much the same story as
> > > inserting?
> > >
> > > i.e., if I make sure the timestamp does exist in the cell, and then I
> > > _update_ it with that timestamp (and same value length), sometimes
> > > hbase still just inserts a new version without touching the old one,
> > > and of course timestamps of this cell becomes out of order. Even if I
> > > delete all versions in that cell and reinsert in the time order, the
> > > result is still out of order. I assume if I do a major compact between
> > > delete all and reinsert, it would be ok, but that's not a good
> > > solution. Is there any good way to update a version of a cell in the
> > > past? or that simply won't work?
> > >
> > > Thanks,
> > >
> >
>

Re: random timestamp insert

Posted by Ryan Rawson <ry...@gmail.com>.

Hey,

So the issue there was when you are using the row-lock support built in, the
waiters for a row lock use up a IPC responder thread. There is only so many
of them. Then your clients start failing as regionservers are busy waiting
for locks to be released.

The suggestion there was to use zookeeper-based locks.  The suggestion is
still valid.

I don't get your question about if timestamp is better than "Long
versioning".  A timestamp is a long - it's default value is
System.currentTimeMillis(), thus it's the milliseconds since epoch 1970 - a
slight variation on the time_t.

Generally I would recommend people avoid setting timestamps unless they have
special needs.  Timestamps order multiple version for a given row/column,
thus if you 'mess it up', you get wrong data returned.

I personally believe that timestamps are not necessairly the best way to
store time-series data.  While in 0.20 we have better query mechanisms (all
values between X and Y is the general mechanism), you can probably do better
with indexes.

-ryan

On Tue, Jun 16, 2009 at 1:04 PM, Alexandre Jaquet <al...@gmail.com>wrote:

> Hello,
>
> I'm also evaluting hbase for some applications and found an old post about
> transactions and concurrent access
>
> http://osdir.com/ml/java.hadoop.hbase.user/2008-05/msg00169.html
>
> Does timestamp is really better than Long versioning ?
>
> Any workaround ?
>
> 2009/6/16 Xinan Wu <wu...@gmail.com>
>
> > I am aware that inserting data into hbase with random timestamp order
> > results indeterminate result.
> >
> > e.g. comments here
> > https://issues.apache.org/jira/browse/HBASE-1249#action_12682369
> >
> > I've personally experienced indeterminate results before when I insert
> > in random timestamp order (i.e., multiple versions with same timestamp
> > in the same cell, out-of-order timestamp when getting multiple
> > versions).
> >
> > In other words, we don't want to go back in time in inserting cells.
> > Deletion is ok. But is updating pretty much the same story as
> > inserting?
> >
> > i.e., if I make sure the timestamp does exist in the cell, and then I
> > _update_ it with that timestamp (and same value length), sometimes
> > hbase still just inserts a new version without touching the old one,
> > and of course timestamps of this cell becomes out of order. Even if I
> > delete all versions in that cell and reinsert in the time order, the
> > result is still out of order. I assume if I do a major compact between
> > delete all and reinsert, it would be ok, but that's not a good
> > solution. Is there any good way to update a version of a cell in the
> > past? or that simply won't work?
> >
> > Thanks,
> >
>

Re: random timestamp insert

Posted by Alexandre Jaquet <al...@gmail.com>.

Hello,

I'm also evaluting hbase for some applications and found an old post about
transactions and concurrent access

http://osdir.com/ml/java.hadoop.hbase.user/2008-05/msg00169.html

Does timestamp is really better than Long versioning ?

Any workaround ?

2009/6/16 Xinan Wu <wu...@gmail.com>

> I am aware that inserting data into hbase with random timestamp order
> results indeterminate result.
>
> e.g. comments here
> https://issues.apache.org/jira/browse/HBASE-1249#action_12682369
>
> I've personally experienced indeterminate results before when I insert
> in random timestamp order (i.e., multiple versions with same timestamp
> in the same cell, out-of-order timestamp when getting multiple
> versions).
>
> In other words, we don't want to go back in time in inserting cells.
> Deletion is ok. But is updating pretty much the same story as
> inserting?
>
> i.e., if I make sure the timestamp does exist in the cell, and then I
> _update_ it with that timestamp (and same value length), sometimes
> hbase still just inserts a new version without touching the old one,
> and of course timestamps of this cell becomes out of order. Even if I
> delete all versions in that cell and reinsert in the time order, the
> result is still out of order. I assume if I do a major compact between
> delete all and reinsert, it would be ok, but that's not a good
> solution. Is there any good way to update a version of a cell in the
> past? or that simply won't work?
>
> Thanks,
>

Re: random timestamp insert

Posted by Xinan Wu <wu...@wuxinan.net>.

Thanks. We want to upgrade data structure of all old records...
so,..., seems like the best way to do it is probably to build a
completely new table instead of doing things in-place.

On Tue, Jun 16, 2009 at 12:55 PM, Ryan Rawson<ry...@gmail.com> wrote:
> You don't "update" an old value, instead you insert the new value with a new
> timestamp and then readers see that value instead.
>
> Multiple values at the same timestamp will always result in undeterminate
> return order. If the ts are the same, how do we know which one is newer?
>
> Id avoid inserting with "random timestamps", conceptually it doesn't make
> sense, since ts isn't just a 64 bit int but has specific semantic
> interpretation, you risk getting nonsensicial results.
>
> On Jun 16, 2009 12:49 PM, "Xinan Wu" <wu...@gmail.com> wrote:
>
> I am aware that inserting data into hbase with random timestamp order
> results indeterminate result.
>
> e.g. comments here
> https://issues.apache.org/jira/browse/HBASE-1249#action_12682369
>
> I've personally experienced indeterminate results before when I insert
> in random timestamp order (i.e., multiple versions with same timestamp
> in the same cell, out-of-order timestamp when getting multiple
> versions).
>
> In other words, we don't want to go back in time in inserting cells.
> Deletion is ok. But is updating pretty much the same story as
> inserting?
>
> i.e., if I make sure the timestamp does exist in the cell, and then I
> _update_ it with that timestamp (and same value length), sometimes
> hbase still just inserts a new version without touching the old one,
> and of course timestamps of this cell becomes out of order. Even if I
> delete all versions in that cell and reinsert in the time order, the
> result is still out of order. I assume if I do a major compact between
> delete all and reinsert, it would be ok, but that's not a good
> solution. Is there any good way to update a version of a cell in the
> past? or that simply won't work?
>
> Thanks,
>

Re: random timestamp insert

Posted by Ryan Rawson <ry...@gmail.com>.

You don't "update" an old value, instead you insert the new value with a new
timestamp and then readers see that value instead.

Multiple values at the same timestamp will always result in undeterminate
return order. If the ts are the same, how do we know which one is newer?

Id avoid inserting with "random timestamps", conceptually it doesn't make
sense, since ts isn't just a 64 bit int but has specific semantic
interpretation, you risk getting nonsensicial results.

On Jun 16, 2009 12:49 PM, "Xinan Wu" <wu...@gmail.com> wrote:

I am aware that inserting data into hbase with random timestamp order
results indeterminate result.

e.g. comments here
https://issues.apache.org/jira/browse/HBASE-1249#action_12682369

I've personally experienced indeterminate results before when I insert
in random timestamp order (i.e., multiple versions with same timestamp
in the same cell, out-of-order timestamp when getting multiple
versions).

In other words, we don't want to go back in time in inserting cells.
Deletion is ok. But is updating pretty much the same story as
inserting?

i.e., if I make sure the timestamp does exist in the cell, and then I
_update_ it with that timestamp (and same value length), sometimes
hbase still just inserts a new version without touching the old one,
and of course timestamps of this cell becomes out of order. Even if I
delete all versions in that cell and reinsert in the time order, the
result is still out of order. I assume if I do a major compact between
delete all and reinsert, it would be ok, but that's not a good
solution. Is there any good way to update a version of a cell in the
past? or that simply won't work?

Thanks,