You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Zoltán Tóth-Czifra <zo...@softonic.com> on 2012/07/18 19:19:44 UTC

Table not storing versions after deleteall

Hi,

For a few days now I'm fighting with a very strange behaviour of Hbase. I hope you can explain it to me.
In short: add rows with version, delete all rows, add the same rows again, old rows disappear.

I create a table, add some rows, some of them with explicit timestamp. After this, I can retreive them with no problem. However, when I delete the cells in these rows (all of them), and add the exact same rows to the table again, the cell with the explicit timestamp is not stored! 

I wonder if you can reproduce it and then explain. Here is a script:

---- SCRIPT ----

version

create     'test_delete_1', {NAME => 'f1', VERSIONS => 100}

put        'test_delete_1', 'row_xzy', 'f1', '123', 1342631168000
put        'test_delete_1', 'row_xzy', 'f1', '456'

get        'test_delete_1', 'row_xzy', {TIMERANGE => [1342631167000, 1442631167000], VERSIONS => 10}

deleteall  'test_delete_1', 'row_xzy', 'f1'

put        'test_delete_1', 'row_xzy', 'f1', '123', 1342631168000
put        'test_delete_1', 'row_xzy', 'f1', '456'

get        'test_delete_1', 'row_xzy', {TIMERANGE => [1342631167000, 1442631167000], VERSIONS => 10}

disable 'test_delete_1'
drop    'test_delete_1'

---- END OF SCRIPT ----

...and the output:

---- OUTPUT ----


$ hbase shell test_del.txt
0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012

0 row(s) in 0.5080 seconds

0 row(s) in 0.1750 seconds

0 row(s) in 0.0100 seconds

COLUMN                                                               CELL
 f1:                                                                 timestamp=1342631583013, value=456
 f1:                                                                 timestamp=1342631168000, value=123
2 row(s) in 0.0260 seconds

0 row(s) in 0.0130 seconds

0 row(s) in 0.0080 seconds

0 row(s) in 0.0090 seconds

COLUMN                                                               CELL
 f1:                                                                 timestamp=1342631583071, value=456
1 row(s) in 0.0080 seconds

0 row(s) in 2.0400 seconds

0 row(s) in 1.0880 seconds

---- END OF OUTPUT ----

Thanks!

RE: Table not storing versions after deleteall

Posted by Zoltán Tóth-Czifra <zo...@softonic.com>.
Hi,

Thank you very much for the clarification! The JIRA task is especially useful so that I can follow the status of this bug/feature.

According to what you say and what I read in the docs, when I do a major compact after the deletes it should resolve the problem - but it doesn't :( Even if I wait some (>10s) after the major compact, it doesn't seem to take effect. I'm working on a small devel cluster (3 slaves, 6 nodes in total).

Can someone explain that?

Thank you!
________________________________________
From: Jason Frantz [jfrantz@maprtech.com]
Sent: Wednesday, July 18, 2012 8:33 PM
To: user@hbase.apache.org
Subject: Re: Table not storing versions after deleteall

Zoltan,

It's actually a bit more complicated because the behavior is
non-deterministic. If a compaction happens the delete marker may be
removed, and if you add the rows back after this time they *will* be
visible. See the following for more info:

http://mail-archives.apache.org/mod_mbox/hbase-dev/201201.mbox/%3C1326684100.80142.YahooMailNeo@web121706.mail.ne1.yahoo.com%3E
https://issues.apache.org/jira/browse/HBASE-5241

-Jason

On Wed, Jul 18, 2012 at 10:32 AM, Zoltán Tóth-Czifra <
zoltan.tothczifra@softonic.com> wrote:

> Hi,
>
> Thanks for the quick answer! So I understand it's the expected behavior...?
> For me it doesn't explain why can't I re-add the exact same rows.
> ________________________________________
> From: jdcryans@gmail.com [jdcryans@gmail.com] on behalf of Jean-Daniel
> Cryans [jdcryans@apache.org]
> Sent: Wednesday, July 18, 2012 7:26 PM
> To: user@hbase.apache.org
> Subject: Re: Table not storing versions after deleteall
>
> The deleteall marker will hide everything that comes before its
> timestamp (in your case it's current time), if you just want to delete
> specific values use delete instead.
>
> Hope this helps,
>
> J-D
>
> On Wed, Jul 18, 2012 at 10:19 AM, Zoltán Tóth-Czifra
> <zo...@softonic.com> wrote:
> > Hi,
> >
> > For a few days now I'm fighting with a very strange behaviour of Hbase.
> I hope you can explain it to me.
> > In short: add rows with version, delete all rows, add the same rows
> again, old rows disappear.
> >
> > I create a table, add some rows, some of them with explicit timestamp.
> After this, I can retreive them with no problem. However, when I delete the
> cells in these rows (all of them), and add the exact same rows to the table
> again, the cell with the explicit timestamp is not stored!
> >
> > I wonder if you can reproduce it and then explain. Here is a script:
> >
> > ---- SCRIPT ----
> >
> > version
> >
> > create     'test_delete_1', {NAME => 'f1', VERSIONS => 100}
> >
> > put        'test_delete_1', 'row_xzy', 'f1', '123', 1342631168000
> > put        'test_delete_1', 'row_xzy', 'f1', '456'
> >
> > get        'test_delete_1', 'row_xzy', {TIMERANGE => [1342631167000,
> 1442631167000], VERSIONS => 10}
> >
> > deleteall  'test_delete_1', 'row_xzy', 'f1'
> >
> > put        'test_delete_1', 'row_xzy', 'f1', '123', 1342631168000
> > put        'test_delete_1', 'row_xzy', 'f1', '456'
> >
> > get        'test_delete_1', 'row_xzy', {TIMERANGE => [1342631167000,
> 1442631167000], VERSIONS => 10}
> >
> > disable 'test_delete_1'
> > drop    'test_delete_1'
> >
> > ---- END OF SCRIPT ----
> >
> > ...and the output:
> >
> > ---- OUTPUT ----
> >
> >
> > $ hbase shell test_del.txt
> > 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012
> >
> > 0 row(s) in 0.5080 seconds
> >
> > 0 row(s) in 0.1750 seconds
> >
> > 0 row(s) in 0.0100 seconds
> >
> > COLUMN                                                               CELL
> >  f1:
> timestamp=1342631583013, value=456
> >  f1:
> timestamp=1342631168000, value=123
> > 2 row(s) in 0.0260 seconds
> >
> > 0 row(s) in 0.0130 seconds
> >
> > 0 row(s) in 0.0080 seconds
> >
> > 0 row(s) in 0.0090 seconds
> >
> > COLUMN                                                               CELL
> >  f1:
> timestamp=1342631583071, value=456
> > 1 row(s) in 0.0080 seconds
> >
> > 0 row(s) in 2.0400 seconds
> >
> > 0 row(s) in 1.0880 seconds
> >
> > ---- END OF OUTPUT ----
> >
> > Thanks!
>

Re: Table not storing versions after deleteall

Posted by Jason Frantz <jf...@maprtech.com>.
Zoltan,

It's actually a bit more complicated because the behavior is
non-deterministic. If a compaction happens the delete marker may be
removed, and if you add the rows back after this time they *will* be
visible. See the following for more info:

http://mail-archives.apache.org/mod_mbox/hbase-dev/201201.mbox/%3C1326684100.80142.YahooMailNeo@web121706.mail.ne1.yahoo.com%3E
https://issues.apache.org/jira/browse/HBASE-5241

-Jason

On Wed, Jul 18, 2012 at 10:32 AM, Zoltán Tóth-Czifra <
zoltan.tothczifra@softonic.com> wrote:

> Hi,
>
> Thanks for the quick answer! So I understand it's the expected behavior...?
> For me it doesn't explain why can't I re-add the exact same rows.
> ________________________________________
> From: jdcryans@gmail.com [jdcryans@gmail.com] on behalf of Jean-Daniel
> Cryans [jdcryans@apache.org]
> Sent: Wednesday, July 18, 2012 7:26 PM
> To: user@hbase.apache.org
> Subject: Re: Table not storing versions after deleteall
>
> The deleteall marker will hide everything that comes before its
> timestamp (in your case it's current time), if you just want to delete
> specific values use delete instead.
>
> Hope this helps,
>
> J-D
>
> On Wed, Jul 18, 2012 at 10:19 AM, Zoltán Tóth-Czifra
> <zo...@softonic.com> wrote:
> > Hi,
> >
> > For a few days now I'm fighting with a very strange behaviour of Hbase.
> I hope you can explain it to me.
> > In short: add rows with version, delete all rows, add the same rows
> again, old rows disappear.
> >
> > I create a table, add some rows, some of them with explicit timestamp.
> After this, I can retreive them with no problem. However, when I delete the
> cells in these rows (all of them), and add the exact same rows to the table
> again, the cell with the explicit timestamp is not stored!
> >
> > I wonder if you can reproduce it and then explain. Here is a script:
> >
> > ---- SCRIPT ----
> >
> > version
> >
> > create     'test_delete_1', {NAME => 'f1', VERSIONS => 100}
> >
> > put        'test_delete_1', 'row_xzy', 'f1', '123', 1342631168000
> > put        'test_delete_1', 'row_xzy', 'f1', '456'
> >
> > get        'test_delete_1', 'row_xzy', {TIMERANGE => [1342631167000,
> 1442631167000], VERSIONS => 10}
> >
> > deleteall  'test_delete_1', 'row_xzy', 'f1'
> >
> > put        'test_delete_1', 'row_xzy', 'f1', '123', 1342631168000
> > put        'test_delete_1', 'row_xzy', 'f1', '456'
> >
> > get        'test_delete_1', 'row_xzy', {TIMERANGE => [1342631167000,
> 1442631167000], VERSIONS => 10}
> >
> > disable 'test_delete_1'
> > drop    'test_delete_1'
> >
> > ---- END OF SCRIPT ----
> >
> > ...and the output:
> >
> > ---- OUTPUT ----
> >
> >
> > $ hbase shell test_del.txt
> > 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012
> >
> > 0 row(s) in 0.5080 seconds
> >
> > 0 row(s) in 0.1750 seconds
> >
> > 0 row(s) in 0.0100 seconds
> >
> > COLUMN                                                               CELL
> >  f1:
> timestamp=1342631583013, value=456
> >  f1:
> timestamp=1342631168000, value=123
> > 2 row(s) in 0.0260 seconds
> >
> > 0 row(s) in 0.0130 seconds
> >
> > 0 row(s) in 0.0080 seconds
> >
> > 0 row(s) in 0.0090 seconds
> >
> > COLUMN                                                               CELL
> >  f1:
> timestamp=1342631583071, value=456
> > 1 row(s) in 0.0080 seconds
> >
> > 0 row(s) in 2.0400 seconds
> >
> > 0 row(s) in 1.0880 seconds
> >
> > ---- END OF OUTPUT ----
> >
> > Thanks!
>

Re: Table not storing versions after deleteall

Posted by Jean-Daniel Cryans <jd...@apache.org>.
On Wed, Jul 18, 2012 at 10:32 AM, Zoltán Tóth-Czifra
<zo...@softonic.com> wrote:
> Thanks for the quick answer! So I understand it's the expected behavior...?

Yeah, it's an edge case.

> For me it doesn't explain why can't I re-add the exact same rows.

As I mentioned, they are hidden by the delete marker. Basically, the
command you sent to HBase is "delete everything that comes before the
current time for this row". This also includes cells you insert in the
future but at a previous timestamp (time travel!).

We have a bit of documentation here: http://hbase.apache.org/book.html#delete

J-D

RE: Table not storing versions after deleteall

Posted by Zoltán Tóth-Czifra <zo...@softonic.com>.
Hi,

Thanks for the quick answer! So I understand it's the expected behavior...?
For me it doesn't explain why can't I re-add the exact same rows.
________________________________________
From: jdcryans@gmail.com [jdcryans@gmail.com] on behalf of Jean-Daniel Cryans [jdcryans@apache.org]
Sent: Wednesday, July 18, 2012 7:26 PM
To: user@hbase.apache.org
Subject: Re: Table not storing versions after deleteall

The deleteall marker will hide everything that comes before its
timestamp (in your case it's current time), if you just want to delete
specific values use delete instead.

Hope this helps,

J-D

On Wed, Jul 18, 2012 at 10:19 AM, Zoltán Tóth-Czifra
<zo...@softonic.com> wrote:
> Hi,
>
> For a few days now I'm fighting with a very strange behaviour of Hbase. I hope you can explain it to me.
> In short: add rows with version, delete all rows, add the same rows again, old rows disappear.
>
> I create a table, add some rows, some of them with explicit timestamp. After this, I can retreive them with no problem. However, when I delete the cells in these rows (all of them), and add the exact same rows to the table again, the cell with the explicit timestamp is not stored!
>
> I wonder if you can reproduce it and then explain. Here is a script:
>
> ---- SCRIPT ----
>
> version
>
> create     'test_delete_1', {NAME => 'f1', VERSIONS => 100}
>
> put        'test_delete_1', 'row_xzy', 'f1', '123', 1342631168000
> put        'test_delete_1', 'row_xzy', 'f1', '456'
>
> get        'test_delete_1', 'row_xzy', {TIMERANGE => [1342631167000, 1442631167000], VERSIONS => 10}
>
> deleteall  'test_delete_1', 'row_xzy', 'f1'
>
> put        'test_delete_1', 'row_xzy', 'f1', '123', 1342631168000
> put        'test_delete_1', 'row_xzy', 'f1', '456'
>
> get        'test_delete_1', 'row_xzy', {TIMERANGE => [1342631167000, 1442631167000], VERSIONS => 10}
>
> disable 'test_delete_1'
> drop    'test_delete_1'
>
> ---- END OF SCRIPT ----
>
> ...and the output:
>
> ---- OUTPUT ----
>
>
> $ hbase shell test_del.txt
> 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012
>
> 0 row(s) in 0.5080 seconds
>
> 0 row(s) in 0.1750 seconds
>
> 0 row(s) in 0.0100 seconds
>
> COLUMN                                                               CELL
>  f1:                                                                 timestamp=1342631583013, value=456
>  f1:                                                                 timestamp=1342631168000, value=123
> 2 row(s) in 0.0260 seconds
>
> 0 row(s) in 0.0130 seconds
>
> 0 row(s) in 0.0080 seconds
>
> 0 row(s) in 0.0090 seconds
>
> COLUMN                                                               CELL
>  f1:                                                                 timestamp=1342631583071, value=456
> 1 row(s) in 0.0080 seconds
>
> 0 row(s) in 2.0400 seconds
>
> 0 row(s) in 1.0880 seconds
>
> ---- END OF OUTPUT ----
>
> Thanks!

Re: Table not storing versions after deleteall

Posted by Jean-Daniel Cryans <jd...@apache.org>.
The deleteall marker will hide everything that comes before its
timestamp (in your case it's current time), if you just want to delete
specific values use delete instead.

Hope this helps,

J-D

On Wed, Jul 18, 2012 at 10:19 AM, Zoltán Tóth-Czifra
<zo...@softonic.com> wrote:
> Hi,
>
> For a few days now I'm fighting with a very strange behaviour of Hbase. I hope you can explain it to me.
> In short: add rows with version, delete all rows, add the same rows again, old rows disappear.
>
> I create a table, add some rows, some of them with explicit timestamp. After this, I can retreive them with no problem. However, when I delete the cells in these rows (all of them), and add the exact same rows to the table again, the cell with the explicit timestamp is not stored!
>
> I wonder if you can reproduce it and then explain. Here is a script:
>
> ---- SCRIPT ----
>
> version
>
> create     'test_delete_1', {NAME => 'f1', VERSIONS => 100}
>
> put        'test_delete_1', 'row_xzy', 'f1', '123', 1342631168000
> put        'test_delete_1', 'row_xzy', 'f1', '456'
>
> get        'test_delete_1', 'row_xzy', {TIMERANGE => [1342631167000, 1442631167000], VERSIONS => 10}
>
> deleteall  'test_delete_1', 'row_xzy', 'f1'
>
> put        'test_delete_1', 'row_xzy', 'f1', '123', 1342631168000
> put        'test_delete_1', 'row_xzy', 'f1', '456'
>
> get        'test_delete_1', 'row_xzy', {TIMERANGE => [1342631167000, 1442631167000], VERSIONS => 10}
>
> disable 'test_delete_1'
> drop    'test_delete_1'
>
> ---- END OF SCRIPT ----
>
> ...and the output:
>
> ---- OUTPUT ----
>
>
> $ hbase shell test_del.txt
> 0.90.4-cdh3u3, r, Thu Jan 26 10:13:36 PST 2012
>
> 0 row(s) in 0.5080 seconds
>
> 0 row(s) in 0.1750 seconds
>
> 0 row(s) in 0.0100 seconds
>
> COLUMN                                                               CELL
>  f1:                                                                 timestamp=1342631583013, value=456
>  f1:                                                                 timestamp=1342631168000, value=123
> 2 row(s) in 0.0260 seconds
>
> 0 row(s) in 0.0130 seconds
>
> 0 row(s) in 0.0080 seconds
>
> 0 row(s) in 0.0090 seconds
>
> COLUMN                                                               CELL
>  f1:                                                                 timestamp=1342631583071, value=456
> 1 row(s) in 0.0080 seconds
>
> 0 row(s) in 2.0400 seconds
>
> 0 row(s) in 1.0880 seconds
>
> ---- END OF OUTPUT ----
>
> Thanks!