You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Niels Basjes <Ni...@basjes.nl> on 2013/12/09 09:47:01 UTC

Why does a delete behave like this?

Hi,

When I first started learning about HBase I compared the logic of setting
new values to something that is similar to the way a tool like Subversion
works: When you set a new value you don't overwrite the old one, you simply
create a new version.
Just like subversion you can then at a later moment retrieve the old value
that way the situation at an earlier date.

(The only real variation to the SVN model is that HBase only retains the
last N versions of a cell.)

There is however one situation where this comparison really fails: When you
do a delete on a cell.
If you want to retrieve the state of a thing from subversion and in the
current version this thing has been deleted then you can still get it back.
With HBase however if you delete a cell you place a tombstone at a specific
time and as such internally the older values are still present.

But when you try to retrieve such an older value then you still get an
empty result back (i.e. no such cell).
The direct consequence of the currently implemented model is that an
application can never retrieve the correct state of a row at an older
timestamp if a delete on any cell has occurred.

Example:

I create a table with one row:

> create 't1', 'cf'
> put 't1', 'rowid', 'cf:1', 'One', 1000
> put 't1', 'rowid', 'cf:2', 'Two', 2000
> put 't1', 'rowid', 'cf:3', 'Three', 3000
> get 't1', 'rowid' , {TIMERANGE => [0,3500]}

    COLUMN                     CELL
     cf:1                      timestamp=1000, value=One
     cf:2                      timestamp=2000, value=Two
     cf:3                      timestamp=3000, value=Three
    3 row(s) in 0.0150 seconds

Then the delete of a cell at a later timestamp:

> delete 't1', 'rowid', 'cf:1', 4000

Now if I retrieve the row at time 3500 I would find it logical that I would
still see the same values as I would above.
This is however the reality:

> get 't1', 'rowid' , {TIMERANGE => [0,3500]}

    COLUMN                     CELL
     cf:2                      timestamp=2000, value=Two
     cf:3                      timestamp=3000, value=Three
    2 row(s) in 0.0120 seconds


Why has it been designed/implemented like this?
What is the logic behind this model?

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Why does a delete behave like this?

Posted by lars hofhansl <la...@apache.org>.
https://issues.apache.org/jira/browse/HBASE-9005  :)
Just have to do it now.



________________________________
 From: Ted Yu <yu...@gmail.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org>; lars hofhansl <la...@apache.org> 
Sent: Monday, December 9, 2013 8:16 PM
Subject: Re: Why does a delete behave like this?
 


I ran the following shell command to create the table:

hbase(main):001:0> create 't1', {NAME => 'cf', KEEP_DELETED_CELLS => true}


The second get command returns the same result as the first.

Lars:
The refguide doesn't cover such usage. Do you think we should document it ?

Cheers



On Mon, Dec 9, 2013 at 2:53 PM, lars hofhansl <la...@apache.org> wrote:

This is because by default a delete marker extends all the way back time.
>When you set KEEP_DELETED_CELLS for your column family this behavior is fixed. I.e. you get correct timerange query behavior even w.r.t. to deletes.
>
>
>-- Lars
>
>
>
>________________________________
> From: Niels Basjes <Ni...@basjes.nl>
>To: user <us...@hbase.apache.org>
>Sent: Monday, December 9, 2013 12:47 AM
>Subject: Why does a delete behave like this?
>
>
>
>Hi,
>
>When I first started learning about HBase I compared the logic of setting
>new values to something that is similar to the way a tool like Subversion
>works: When you set a new value you don't overwrite the old one, you simply
>create a new version.
>Just like subversion you can then at a later moment retrieve the old value
>that way the situation at an earlier date.
>
>(The only real variation to the SVN model is that HBase only retains the
>last N versions of a cell.)
>
>There is however one situation where this comparison really fails: When you
>do a delete on a cell.
>If you want to retrieve the state of a thing from subversion and in the
>current version this thing has been deleted then you can still get it back.
>With HBase however if you delete a cell you place a tombstone at a specific
>time and as such internally the older values are still present.
>
>But when you try to retrieve such an older value then you still get an
>empty result back (i.e. no such cell).
>The direct consequence of the currently implemented model is that an
>application can never retrieve the correct state of a row at an older
>timestamp if a delete on any cell has occurred.
>
>Example:
>
>I create a table with one row:
>
>> create 't1', 'cf'
>> put 't1', 'rowid', 'cf:1', 'One', 1000
>> put 't1', 'rowid', 'cf:2', 'Two', 2000
>> put 't1', 'rowid', 'cf:3', 'Three', 3000
>> get 't1', 'rowid' , {TIMERANGE => [0,3500]}
>
>    COLUMN                     CELL
>     cf:1                      timestamp=1000, value=One
>     cf:2                      timestamp=2000, value=Two
>     cf:3                      timestamp=3000, value=Three
>    3 row(s) in 0.0150 seconds
>
>Then the delete of a cell at a later timestamp:
>
>> delete 't1', 'rowid', 'cf:1', 4000
>
>Now if I retrieve the row at time 3500 I would find it logical that I would
>still see the same values as I would above.
>This is however the reality:
>
>> get 't1', 'rowid' , {TIMERANGE => [0,3500]}
>
>    COLUMN                     CELL
>     cf:2                      timestamp=2000, value=Two
>     cf:3                      timestamp=3000, value=Three
>    2 row(s) in 0.0120 seconds
>
>
>Why has it been designed/implemented like this?
>What is the logic behind this model?
>
>--
>Best regards / Met vriendelijke groeten,
>
>Niels Basjes

Re: Why does a delete behave like this?

Posted by Ted Yu <yu...@gmail.com>.
I ran the following shell command to create the table:
hbase(main):001:0> create 't1', {NAME => 'cf', KEEP_DELETED_CELLS => true}

The second get command returns the same result as the first.

Lars:
The refguide doesn't cover such usage. Do you think we should document it ?

Cheers


On Mon, Dec 9, 2013 at 2:53 PM, lars hofhansl <la...@apache.org> wrote:

> This is because by default a delete marker extends all the way back time.
> When you set KEEP_DELETED_CELLS for your column family this behavior is
> fixed. I.e. you get correct timerange query behavior even w.r.t. to deletes.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Niels Basjes <Ni...@basjes.nl>
> To: user <us...@hbase.apache.org>
> Sent: Monday, December 9, 2013 12:47 AM
> Subject: Why does a delete behave like this?
>
>
> Hi,
>
> When I first started learning about HBase I compared the logic of setting
> new values to something that is similar to the way a tool like Subversion
> works: When you set a new value you don't overwrite the old one, you simply
> create a new version.
> Just like subversion you can then at a later moment retrieve the old value
> that way the situation at an earlier date.
>
> (The only real variation to the SVN model is that HBase only retains the
> last N versions of a cell.)
>
> There is however one situation where this comparison really fails: When you
> do a delete on a cell.
> If you want to retrieve the state of a thing from subversion and in the
> current version this thing has been deleted then you can still get it back.
> With HBase however if you delete a cell you place a tombstone at a specific
> time and as such internally the older values are still present.
>
> But when you try to retrieve such an older value then you still get an
> empty result back (i.e. no such cell).
> The direct consequence of the currently implemented model is that an
> application can never retrieve the correct state of a row at an older
> timestamp if a delete on any cell has occurred.
>
> Example:
>
> I create a table with one row:
>
> > create 't1', 'cf'
> > put 't1', 'rowid', 'cf:1', 'One', 1000
> > put 't1', 'rowid', 'cf:2', 'Two', 2000
> > put 't1', 'rowid', 'cf:3', 'Three', 3000
> > get 't1', 'rowid' , {TIMERANGE => [0,3500]}
>
>     COLUMN                     CELL
>      cf:1                      timestamp=1000, value=One
>      cf:2                      timestamp=2000, value=Two
>      cf:3                      timestamp=3000, value=Three
>     3 row(s) in 0.0150 seconds
>
> Then the delete of a cell at a later timestamp:
>
> > delete 't1', 'rowid', 'cf:1', 4000
>
> Now if I retrieve the row at time 3500 I would find it logical that I would
> still see the same values as I would above.
> This is however the reality:
>
> > get 't1', 'rowid' , {TIMERANGE => [0,3500]}
>
>     COLUMN                     CELL
>      cf:2                      timestamp=2000, value=Two
>      cf:3                      timestamp=3000, value=Three
>     2 row(s) in 0.0120 seconds
>
>
> Why has it been designed/implemented like this?
> What is the logic behind this model?
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>

Re: Why does a delete behave like this?

Posted by lars hofhansl <la...@apache.org>.
This is because by default a delete marker extends all the way back time.
When you set KEEP_DELETED_CELLS for your column family this behavior is fixed. I.e. you get correct timerange query behavior even w.r.t. to deletes.


-- Lars



________________________________
 From: Niels Basjes <Ni...@basjes.nl>
To: user <us...@hbase.apache.org> 
Sent: Monday, December 9, 2013 12:47 AM
Subject: Why does a delete behave like this?
 

Hi,

When I first started learning about HBase I compared the logic of setting
new values to something that is similar to the way a tool like Subversion
works: When you set a new value you don't overwrite the old one, you simply
create a new version.
Just like subversion you can then at a later moment retrieve the old value
that way the situation at an earlier date.

(The only real variation to the SVN model is that HBase only retains the
last N versions of a cell.)

There is however one situation where this comparison really fails: When you
do a delete on a cell.
If you want to retrieve the state of a thing from subversion and in the
current version this thing has been deleted then you can still get it back.
With HBase however if you delete a cell you place a tombstone at a specific
time and as such internally the older values are still present.

But when you try to retrieve such an older value then you still get an
empty result back (i.e. no such cell).
The direct consequence of the currently implemented model is that an
application can never retrieve the correct state of a row at an older
timestamp if a delete on any cell has occurred.

Example:

I create a table with one row:

> create 't1', 'cf'
> put 't1', 'rowid', 'cf:1', 'One', 1000
> put 't1', 'rowid', 'cf:2', 'Two', 2000
> put 't1', 'rowid', 'cf:3', 'Three', 3000
> get 't1', 'rowid' , {TIMERANGE => [0,3500]}

    COLUMN                     CELL
     cf:1                      timestamp=1000, value=One
     cf:2                      timestamp=2000, value=Two
     cf:3                      timestamp=3000, value=Three
    3 row(s) in 0.0150 seconds

Then the delete of a cell at a later timestamp:

> delete 't1', 'rowid', 'cf:1', 4000

Now if I retrieve the row at time 3500 I would find it logical that I would
still see the same values as I would above.
This is however the reality:

> get 't1', 'rowid' , {TIMERANGE => [0,3500]}

    COLUMN                     CELL
     cf:2                      timestamp=2000, value=Two
     cf:3                      timestamp=3000, value=Three
    2 row(s) in 0.0120 seconds


Why has it been designed/implemented like this?
What is the logic behind this model?

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Why does a delete behave like this?

Posted by Ted Yu <yu...@gmail.com>.
What's the max versions for table 't1' ?
When you issue 'describe' command, you would see something similar to the
following:

VERSIONS => '1'



On Mon, Dec 9, 2013 at 4:47 PM, Niels Basjes <Ni...@basjes.nl> wrote:

> Hi,
>
> When I first started learning about HBase I compared the logic of setting
> new values to something that is similar to the way a tool like Subversion
> works: When you set a new value you don't overwrite the old one, you simply
> create a new version.
> Just like subversion you can then at a later moment retrieve the old value
> that way the situation at an earlier date.
>
> (The only real variation to the SVN model is that HBase only retains the
> last N versions of a cell.)
>
> There is however one situation where this comparison really fails: When you
> do a delete on a cell.
> If you want to retrieve the state of a thing from subversion and in the
> current version this thing has been deleted then you can still get it back.
> With HBase however if you delete a cell you place a tombstone at a specific
> time and as such internally the older values are still present.
>
> But when you try to retrieve such an older value then you still get an
> empty result back (i.e. no such cell).
> The direct consequence of the currently implemented model is that an
> application can never retrieve the correct state of a row at an older
> timestamp if a delete on any cell has occurred.
>
> Example:
>
> I create a table with one row:
>
> > create 't1', 'cf'
> > put 't1', 'rowid', 'cf:1', 'One', 1000
> > put 't1', 'rowid', 'cf:2', 'Two', 2000
> > put 't1', 'rowid', 'cf:3', 'Three', 3000
> > get 't1', 'rowid' , {TIMERANGE => [0,3500]}
>
>     COLUMN                     CELL
>      cf:1                      timestamp=1000, value=One
>      cf:2                      timestamp=2000, value=Two
>      cf:3                      timestamp=3000, value=Three
>     3 row(s) in 0.0150 seconds
>
> Then the delete of a cell at a later timestamp:
>
> > delete 't1', 'rowid', 'cf:1', 4000
>
> Now if I retrieve the row at time 3500 I would find it logical that I would
> still see the same values as I would above.
> This is however the reality:
>
> > get 't1', 'rowid' , {TIMERANGE => [0,3500]}
>
>     COLUMN                     CELL
>      cf:2                      timestamp=2000, value=Two
>      cf:3                      timestamp=3000, value=Three
>     2 row(s) in 0.0120 seconds
>
>
> Why has it been designed/implemented like this?
> What is the logic behind this model?
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>

Re: Why does a delete behave like this?

Posted by Stack <st...@duboce.net>.
Qualification:  My response misleads.  I was responding to delete of an
explicit version and got carried away.  Please see Lars' answer for the
proper response.
St.Ack


On Mon, Dec 9, 2013 at 1:30 PM, Stack <st...@duboce.net> wrote:

> On Mon, Dec 9, 2013 at 4:47 PM, Niels Basjes <Ni...@basjes.nl> wrote:
>
>>
>> Why has it been designed/implemented like this?
>> What is the logic behind this model?
>>
>
> Hey Niels:
>
> It is probably fair to call this an instance of implementation leaking and
> polluted our data model.  We should fix it.
>
> Currently, deletes always sort before all other types when all other
> coordinates are the same (same row, same column family, same timestamp,
> etc.)  IIRC, it was done this way along time ago because it made delete
> reasoning 'easier'.  This forced sort ordering is why you see the behavior
> you note in your shell experiments.
>
> Our Sergey recently has suggested we undo our factoring in 'type' when
> sorting KeyValues/Cells; rather, we would distinguish pivoting on sequence
> id when all else matches.  Awkwardly, we'd then have to let user add
> sequence id when querying a specific Cell.  This would not be easy to do.
>  Sequence id is an internal, amorphous notion at the moment -- it exists
> while KeyValues are in flight but is (mostly) dropped after KeyValues
> persist to hfiles -- but it looks like it is fast becoming more tangible
> given some issues that arise around WAL replay at recovery time and in
> corner cases replicating.
>
> What is your thinking on this Niels?  Its current implementation
> interrupts your ability building an app on hbase?
>
> Thanks,
> St.Ack
>

Re: Why does a delete behave like this?

Posted by Stack <st...@duboce.net>.
On Mon, Dec 9, 2013 at 4:47 PM, Niels Basjes <Ni...@basjes.nl> wrote:

>
> Why has it been designed/implemented like this?
> What is the logic behind this model?
>

Hey Niels:

It is probably fair to call this an instance of implementation leaking and
polluted our data model.  We should fix it.

Currently, deletes always sort before all other types when all other
coordinates are the same (same row, same column family, same timestamp,
etc.)  IIRC, it was done this way along time ago because it made delete
reasoning 'easier'.  This forced sort ordering is why you see the behavior
you note in your shell experiments.

Our Sergey recently has suggested we undo our factoring in 'type' when
sorting KeyValues/Cells; rather, we would distinguish pivoting on sequence
id when all else matches.  Awkwardly, we'd then have to let user add
sequence id when querying a specific Cell.  This would not be easy to do.
 Sequence id is an internal, amorphous notion at the moment -- it exists
while KeyValues are in flight but is (mostly) dropped after KeyValues
persist to hfiles -- but it looks like it is fast becoming more tangible
given some issues that arise around WAL replay at recovery time and in
corner cases replicating.

What is your thinking on this Niels?  Its current implementation interrupts
your ability building an app on hbase?

Thanks,
St.Ack