You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/01/24 09:14:40 UTC

[jira] [Issue Comment Edited] (HBASE-5268) Add delete column prefix delete marker

    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191997#comment-13191997 ] 

Lars Hofhansl edited comment on HBASE-5268 at 1/24/12 8:14 AM:
---------------------------------------------------------------

Here's a patch. The bulk is testing.

During testing with delete marker types I found one strange scenario:
Say you
# put columns 123, 1234, 12345
# then delete with prefix 123
# then put column 123 again
# now delete 123 with a normal column marker

Now what happens is that the ScanDeleteTracker sees the normal column delete marker first, then it will see the new put for column 123. Now it will conclude that it is done with all versions of column 123, and thus seeks ahead to the next column. During that process the prefix marker with prefix 123 is also skipped. And hence 1234 and 12345 are no longer marked as deleted.

This only happens in exactly this scenario.

I cannot fix this without de-optimizing column delete markers or adding complicated logic to sort prefix delete marker always before all prefixes they affect regardless of the timestamp.

I added this scenario as a unit test.
                
      was (Author: lhofhansl):
    Here's a patch. The bulk is testing.

During testing with deleted marker types I found one strange scenario:
Say you
# put columns 123, 1234, 12345
# then delete with prefix 123
# then put column 123 again
# now delete 123 with a normal column marker

Now what happens is that the ScanDeleteTracker sees the normal column delete marker first, then it will see the new put for column 123. Now it will conclude that it is done with all versions of column 123, and thus seek ahead to the next column. During that process the prefix marker with prefix 123 is also skipped. And hence 1234 and 12345 are no longer marked as deleted.

This only happens in exactly this scenario.

I cannot fix this without de-optimizing column delete markers or adding complicated logic to sort prefix delete marker always before all prefixes they affect regardless of the timestamp.

I added this scenario as a unit test.
                  
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira