You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (Created) (JIRA)" <ji...@apache.org> on 2012/01/24 06:43:40 UTC

[jira] [Created] (HBASE-5268) Add delete column prefix delete marker

Add delete column prefix delete marker
--------------------------------------

                 Key: HBASE-5268
                 URL: https://issues.apache.org/jira/browse/HBASE-5268
             Project: HBase
          Issue Type: Improvement
          Components: client, regionserver
            Reporter: Lars Hofhansl
            Assignee: Lars Hofhansl
             Fix For: 0.94.0


This is another part missing in the "wide row challenge".
Currently entire families of a row can be deleted or individual columns or versions.
There is no facility to mark multiple columns for deletion by column prefix.

Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192727#comment-13192727 ] 

Todd Lipcon commented on HBASE-5268:
------------------------------------

bq. Good point. Isn't that the same for a column delete with many versions that span many blocks, though?

Yes, but my guess is that we currently seek to the beginning of the column as well as to the specific timestamp, in the case that a timestamp is requested. Mikhail would know more (I'm just going based on his presentation last week). The distinction here is that seeking to the beginning of a column is "well defined" whereas seeking to the beginning of the prefix is practically impossible (I suppose you could seek to O(length of column key) spots, but hardly reasonable)
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Mikhail Bautin (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192746#comment-13192746 ] 

Mikhail Bautin commented on HBASE-5268:
---------------------------------------

@Lars, Todd: Yes, the column prefix delete will break lazy seek optimization. The lazy seek optimization is only enabled for multi-column queries, though, so the bug would not be seen when only reading one column or the whole row.

More generally, our various types of deletes get in the way of "point queries", e.g. reading a particular column or a particular timestamp range, because we have to check for deletes first. As Todd pointed out, with column prefix deletes we would have to search the entire row for multi-column gets. We could trigger these optimizations using another "Column Prefix Delete" Bloom filter, but that might be an overkill. It might be easier to detect and delete just the required columns at the write path.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5268:
---------------------------------

    Attachment: 5268-v4.txt

One more test. And guard for short column identifiers following longer prefix delete markers.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191958#comment-13191958 ] 

Lars Hofhansl edited comment on HBASE-5268 at 1/24/12 8:42 AM:
---------------------------------------------------------------

@Stack... Here's an example.
Say you can columns X100, X101, X102, ..., X199, and X200, X201, X202, ..., X299, X300, ..., in a row R, all in the same column family.
(I'd have this if I modeled rows inside HBase rows by using a prefix to identify the inner rows.)

Now I want to be able to delete all columns that start with X1. Without a prefix delete marker I'd have to place 100 column delete markers. With a prefix marker I can just place one marker for the X1 prefix.
If the prefix is empty, this would degenerate into the same behavior as a family marker (albeit less efficient).

                
      was (Author: lhofhansl):
    @Stack... Here's an example.
Say you can columns X100, X101, X102, ..., X199, and X200, X201, X202, ..., X299, X300, ..., in a row R, all in the same column family.
(I'd have this if I modeled rows inside HBase by using a prefix to identify the inner rows.)

Now I want to be able to delete all columns that start with X1. Without a prefix delete marker I'd have to place 100 column delete markers. With a prefix marker I can just place one marker for the X1 prefix.
If the prefix is empty, this would degenerate into the same behavior as a family marker (albeit less efficient).

                  
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191997#comment-13191997 ] 

Lars Hofhansl edited comment on HBASE-5268 at 1/24/12 8:14 AM:
---------------------------------------------------------------

Here's a patch. The bulk is testing.

During testing with delete marker types I found one strange scenario:
Say you
# put columns 123, 1234, 12345
# then delete with prefix 123
# then put column 123 again
# now delete 123 with a normal column marker

Now what happens is that the ScanDeleteTracker sees the normal column delete marker first, then it will see the new put for column 123. Now it will conclude that it is done with all versions of column 123, and thus seeks ahead to the next column. During that process the prefix marker with prefix 123 is also skipped. And hence 1234 and 12345 are no longer marked as deleted.

This only happens in exactly this scenario.

I cannot fix this without de-optimizing column delete markers or adding complicated logic to sort prefix delete marker always before all prefixes they affect regardless of the timestamp.

I added this scenario as a unit test.
                
      was (Author: lhofhansl):
    Here's a patch. The bulk is testing.

During testing with deleted marker types I found one strange scenario:
Say you
# put columns 123, 1234, 12345
# then delete with prefix 123
# then put column 123 again
# now delete 123 with a normal column marker

Now what happens is that the ScanDeleteTracker sees the normal column delete marker first, then it will see the new put for column 123. Now it will conclude that it is done with all versions of column 123, and thus seek ahead to the next column. During that process the prefix marker with prefix 123 is also skipped. And hence 1234 and 12345 are no longer marked as deleted.

This only happens in exactly this scenario.

I cannot fix this without de-optimizing column delete markers or adding complicated logic to sort prefix delete marker always before all prefixes they affect regardless of the timestamp.

I added this scenario as a unit test.
                  
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192016#comment-13192016 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

When I muck with KeyValue.KeyComparator to always sort prefix delete marker before all columns with the same prefix regardless of TS, then I see the opposite scenario that in the setup above the column delete marker is ignored and hence the newly put version of column 123 is visible again. This is the because the column marker is considered less specific and hence won't override the column prefix marker. Arghh :)

So ScanDeleteTracker would need to be completely refactored in order to keep prefix marker separate from column and version markers.
The question is: Is that worth the risk, added code entropy, and effort, or can we document that prefix delete markers should not be used together with column delete markers (where the column markers column is identical to the prefix marker's column prefix, which does not make much sense anyway)?

                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192755#comment-13192755 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

@Ted: Definitely. Deletes are documented here: http://hbase.apache.org/book.html#version.delete
Maybe a small section about how the lazy seek interacts with them (although that would be a more a technical discussion).
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268-v5.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197340#comment-13197340 ] 

stack commented on HBASE-5268:
------------------------------

I added a footnote to book in delete section pointing at your blog Lars.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>         Attachments: 5268-proof.txt, 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268-v5.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192753#comment-13192753 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

"Column Prefix Delete" Bloom filter *is* overkill. I'll think about this a bit more, but probably mark as "won't fix".
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268-v5.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192874#comment-13192874 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

I think the top-of-row is good enough. We have to seek there anyway because of potential family delete markers (although I think Mikhail plans to - or has already - optimize this with delete marker bloom filters).

                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268-v5.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192541#comment-13192541 ] 

Zhihong Yu commented on HBASE-5268:
-----------------------------------

I think the idea in patch v3 is useful in handling wide rows.

I suggest making the selection of column prefix generic (in place of Bytes.compareTo call).
e.g. user may want to use a regex to select the columns to be deleted.
How hard is it to accommodate the above case ?
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192643#comment-13192643 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

Test run looks good.
@Todd are you -1 on this? (I'd understand if you were).
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192751#comment-13192751 ] 

Zhihong Yu commented on HBASE-5268:
-----------------------------------

Would a summary of the discussion here be eligible to be included in HBase book ?
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268-v5.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192555#comment-13192555 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

I think that would difficult, because we could not sort the delete markers correctly. Delete by prefix works, because I can sort those at the right place w.r.t. the cells that they affect (delete by suffix, for example, would not work).

I think what you have in mind would be the work of some kind of "delete filter" that needs to scan all the columns.

The goal I want to reach is that we have the same flexibility for columns that we have for row-keys, so that key schema in HBase is not limited to the row-key.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192725#comment-13192725 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

Added a test that that insert 100000 kvs, places a prefix marker at the beginning, and then Gets kv #90000, and it is correctly marked as delete.
Do I need to enable Mikhail's optimizations or are they on by default?
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5268:
---------------------------------

    Attachment: 5268-v2.txt

This version has no anomalies. And it actually isn't so bad. The only part I do not like is the change to the sorting in KeyValye.KeyComparator.

Please let me know what you think.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl resolved HBASE-5268.
----------------------------------

    Resolution: Won't Fix
    
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>         Attachments: 5268-proof.txt, 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268-v5.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Matt Corgan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192832#comment-13192832 ] 

Matt Corgan commented on HBASE-5268:
------------------------------------

This is probably way too complicated for a short term solution, but during flush/compaction could you repeat delete markers at the beginning of each block for as long as they are relevant?
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268-v5.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5268:
---------------------------------

    Attachment: 5268-v5.txt

Patch that includes the get test, for reference.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268-v5.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5268:
---------------------------------

    Attachment: 5268.txt

Here's a patch. The bulk is testing.

During testing with deleted marker types I found one strange scenario:
Say you
# put columns 123, 1234, 12345
# then delete with prefix 123
# then put column 123 again
# now delete 123 with a normal column marker

Now what happens is that the ScanDeleteTracker sees the normal column delete marker first, then it will see the new put for column 123. Now it will conclude that it is done with all versions of column 123, and thus seek ahead to the next column. During that process the prefix marker with prefix 123 is also skipped. And hence 1234 and 12345 are no longer marked as deleted.

This only happens in exactly this scenario.

I cannot fix this without de-optimizing column delete markers or adding complicated logic to sort prefix delete marker always before all prefixes they affect regardless of the timestamp.

I added this scenario as a unit test.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192741#comment-13192741 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

Ah yes, now it's failing. And there I thought I'd know that code :)
OK, so these need to be top of row. So not sure now, it makes the code more complicated than is warranted for this feature.

                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192255#comment-13192255 ] 

Lars Hofhansl edited comment on HBASE-5268 at 1/24/12 4:27 PM:
---------------------------------------------------------------

I'll work on a fully correct patch today, and then we can decide whether it is worth the extra complexity. I find this first patch compelling, because (1) is very small and easy to verify and (2) it keeps column prefix delete markers sorted (by timestamp) with the KVs they affect.
In the bigger patch, neither will be true anymore.

                
      was (Author: lhofhansl):
    I'll work on a fully correct patch today, and then we can decide whether it is worth the extra complexity. I find this first patch compelling, because (1) is very small and easy to verify and (2) it keeps column prefix delete markers sorted (by timestamp) with the KVs they affect.
In the bigger patch, neither will be try anymore.

                  
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192732#comment-13192732 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

Oh and in this case all columns are names 12300001, 12300002, ..., 12399999. The prefix marker set was 123. The column requested by get was 12390000.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5268:
---------------------------------

    Status: Open  (was: Patch Available)
    
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268-v5.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192667#comment-13192667 ] 

Todd Lipcon commented on HBASE-5268:
------------------------------------

I don't know this code path very well at all... can you clarify for me where a "delete prefix" marker is inserted? Is it at the top of the row, or before the columns starting with that prefix?

eg I have the following row:
cola_1
cola_2
colb_1
colb_2
colc_1

and I insert a "deletePrefix(colb_)"
Does it sort before cola_1 or does it sort before colb_1?
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192705#comment-13192705 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

Hmm... Good point. Isn't that the same for a column delete with many versions that span many blocks, though? In that case the ScanDeleteTracker would need to hold on to all prefix markers.
I.e. there might be a prefix marker for 123 at T and then one for 1234 at T+A and another one for 12345 at T+A+B. They would need to stored and checked every time based on the TS of the rows they affect.

I'll try the Get part.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5268:
---------------------------------

    Fix Version/s:     (was: 0.94.0)
         Assignee:     (was: Lars Hofhansl)

Leaving open for now.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5268:
---------------------------------

    Status: Patch Available  (was: Open)
    
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192734#comment-13192734 ] 

Todd Lipcon commented on HBASE-5268:
------------------------------------

What if you'd also inserted 12200000 through 12299999? As is, your prefix marker is probably hitting the top of the row, instead of falling in the middle.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192766#comment-13192766 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

Just did a quick test to sort these to the top of row. The Get test works fine then. Would need to fix up some other parts of the code.
Could use a (prefix) Trie to store the column prefix delete markers in ScanDeleteTracker. That too seems overkill, though.

                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268-v5.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192686#comment-13192686 ] 

Todd Lipcon commented on HBASE-5268:
------------------------------------

I can't see how that's correct. Let's say instead of just two columns with colb_* we had a few thousand, such that it spanned many blocks. Then we requested a get with only the column colb_2000. The optimizations that Mikhail described at the HUG last week would assumedly want to seek directly to that block, but with the delete marker somewhere "in the middle" we'd be forced to scan forward from the beginning of the whole row looking for potential deletions somewhere along the way. Right?

I think the delete prefix markers would have to go at the very top of the row, alongside the "delete family" markers (since we already force a top-of-row seek to look for delete-family, assuming lack of a deletion bloom).
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192757#comment-13192757 ] 

Zhihong Yu commented on HBASE-5268:
-----------------------------------

I expect HBase users to have understanding of HBase internals.
Plus, the very exposure of this detail may give rise to the next breakthrough (in performance optimization).
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268-v5.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192676#comment-13192676 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

@Ted: I'll send a message to dev@ momentarily.

@Todd: I've changed this code a lot (HBASE-4071 and HBASE-4536)... :)
It is sorted along with the columns. Assuming all KV have the same timestamp it would sort like this:
cola_1
cola_2
colb_ delete marker
colb_1
colb_2
colc_1

If there are multiple versions of a column the delete marker would sort according to the TS as well (the exact same ordering rules used for column delete markers).
This provides for the minimum amount of time a scanner would need to hold on to the delete marker during a scan.
(In v2 I tried to change the ordering w.r.t. TS', but that turned out to be a rather nasty change.)

                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192477#comment-13192477 ] 

Zhihong Yu commented on HBASE-5268:
-----------------------------------

bq. Delete.deleteColumnsByPrefix now advises against mixing this with deleteColumn for the same qualifiers.
Shall we poll user@ and dev@ for the above constraint ?
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195432#comment-13195432 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

I wrote a blog entry about this here: http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html
Maybe we can take part of that into the HBase book.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>         Attachments: 5268-proof.txt, 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268-v5.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191916#comment-13191916 ] 

stack commented on HBASE-5268:
------------------------------

Give an example of what you mean please Lars (I think I know what you are on about....)
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192255#comment-13192255 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

I'll work on a fully correct patch today, and then we can decide whether it is worth the extra complexity. I find this first patch compelling, because (1) is very small and easy to verify and (2) it keeps column prefix delete markers sorted (by timestamp) with the KVs they affect.
In the bigger patch, neither will be try anymore.

                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192681#comment-13192681 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

Slight correction of wording. The above sort orders happens independently of the columns TS'.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Todd Lipcon (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192496#comment-13192496 ] 

Todd Lipcon commented on HBASE-5268:
------------------------------------

This seems a little complicated to me... if this is meant for the purpose of building higher-level transactional capabilities, maybe we should start a branch for that exploration? I'm wary that this introduces a lot of complexity around optimizing the read path, for the sake of optimizing a delete path (which in my experience is rarely used)
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191958#comment-13191958 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

@Stack... Here's an example.
Say you can columns X100, X101, X102, ..., X199, and X200, X201, X202, ..., X299, X300, ..., in a row R, all in the same column family.
(I'd have this if I modeled rows inside HBase by using a prefix to identify the inner rows.)

Now I want to be able to delete all columns that start with X1. Without a prefix delete marker I'd have to place 100 column delete markers. With a prefix marker I can just place one marker for the X1 prefix.
If the prefix is empty, this would degenerate into the same behavior as a family marker (albeit less efficient).

                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192624#comment-13192624 ] 

Hadoop QA commented on HBASE-5268:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12511736/5268-v4.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -145 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 84 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.replication.TestReplication

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/849//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/849//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/849//console

This message is automatically generated.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192648#comment-13192648 ] 

Zhihong Yu commented on HBASE-5268:
-----------------------------------

I don't seem to get a poll from dev@hbase mailing list about this feature.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192016#comment-13192016 ] 

Lars Hofhansl edited comment on HBASE-5268 at 1/24/12 8:43 AM:
---------------------------------------------------------------

When I muck with KeyValue.KeyComparator to always sort prefix delete marker before all columns with the same prefix regardless of TS, then I see the opposite scenario that in the setup above the column delete marker is ignored and hence the newly put version of column 123 is visible again. This is the because the column marker is considered less specific and hence won't override the column prefix marker, even though it is newer. Arghh :)

So ScanDeleteTracker would need to be completely refactored in order to keep prefix markers separate from column and version markers.
The question is: Is that worth the risk, added code entropy, and effort, or can we document that prefix delete markers should not be used together with column delete markers (where the column markers column is identical to the prefix marker's column prefix, which does not make much sense anyway)?

                
      was (Author: lhofhansl):
    When I muck with KeyValue.KeyComparator to always sort prefix delete marker before all columns with the same prefix regardless of TS, then I see the opposite scenario that in the setup above the column delete marker is ignored and hence the newly put version of column 123 is visible again. This is the because the column marker is considered less specific and hence won't override the column prefix marker. Arghh :)

So ScanDeleteTracker would need to be completely refactored in order to keep prefix marker separate from column and version markers.
The question is: Is that worth the risk, added code entropy, and effort, or can we document that prefix delete markers should not be used together with column delete markers (where the column markers column is identical to the prefix marker's column prefix, which does not make much sense anyway)?

                  
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5268:
---------------------------------

    Attachment: 5268-proof.txt

Just to prove to myself that it can be done... Here's a patch that sorts all column prefix delete markers top-of-row.
ScsnDeleteTracker keeps a map with all delete prefixes and checks all of them, removing them when we scanned past the prefix.

I have no intention to commit this. This is just a proof of concept. It is not efficient (should use a trie of bytes for the prefixes).

Generally I am less exited about this now than I was earlier. A lot of work to optimize a path that is used less frequently than the Put path.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>         Attachments: 5268-proof.txt, 5268-v2.txt, 5268-v3.txt, 5268-v4.txt, 5268-v5.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5268:
---------------------------------

    Attachment: 5268-v3.txt

Sorting prefix markers first has another problem.
Placing a prefix marker for 123 at T and then another for 1234 at T+1 will ignore the 2nd one even though it is for newer timestamp. To fix that ScanDeleteTracker would need to keep track of delete marker that are prefixes of each other and would increase memory requirements.

So I am back to the first approach. It is simpler, does not much with the sort order, has less memory requirements, and is easier to grok.
Delete.deleteColumnsByPrefix now advises against mixing this with deleteColumn for the *same* qualifiers.

I added more test and propose this for commit.
                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192529#comment-13192529 ] 

Lars Hofhansl commented on HBASE-5268:
--------------------------------------

@Todd: Are you talking about v2 or v3? I agree v2 is too complicated.
v3 just adds this to the read path (in ScanDeleteTracker):
{code}
+        // check whether the delete marker is a prefix
+        if (deleteType == KeyValue.Type.DeleteColumnPrefix.getCode()) {
+          if (Bytes.compareTo(deleteBuffer, deleteOffset, deleteLength, buffer,
+              qualifierOffset, deleteLength) == 0) {
+            if (timestamp <= deleteTimestamp) {
+              return DeleteResult.COLUMN_DELETED;
+            }
+          } else {
+            // past the prefix marker
+            deleteBuffer = null;
+          }
{code}
The rest is indentation (which makes the change to ScanDeleteTracker look big), tests, and two more methods on Delete.java.
If prefix markers are not used there is no performance impact.

@Ted: Sure. Although if we cannot live with this constraint, this becomes too complicated to be useful.

                
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268-v2.txt, 5268-v3.txt, 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (HBASE-5268) Add delete column prefix delete marker

Posted by "Lars Hofhansl (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191958#comment-13191958 ] 

Lars Hofhansl edited comment on HBASE-5268 at 1/24/12 8:44 AM:
---------------------------------------------------------------

@Stack... Here's an example.
Say you have columns X100, X101, X102, ..., X199, and X200, X201, X202, ..., X299, X300, ..., in a row R, all in the same column family.
(I'd have this if I modeled rows inside HBase rows by using a prefix to identify the inner rows.)

Now I want to be able to delete all columns that start with X1. Without a prefix delete marker I'd have to place 100 column delete markers. With a prefix marker I can just place one marker for the X1 prefix.
If the prefix is empty, this would degenerate into the same behavior as a family marker (albeit less efficient).

                
      was (Author: lhofhansl):
    @Stack... Here's an example.
Say you can columns X100, X101, X102, ..., X199, and X200, X201, X202, ..., X299, X300, ..., in a row R, all in the same column family.
(I'd have this if I modeled rows inside HBase rows by using a prefix to identify the inner rows.)

Now I want to be able to delete all columns that start with X1. Without a prefix delete marker I'd have to place 100 column delete markers. With a prefix marker I can just place one marker for the X1 prefix.
If the prefix is empty, this would degenerate into the same behavior as a family marker (albeit less efficient).

                  
> Add delete column prefix delete marker
> --------------------------------------
>
>                 Key: HBASE-5268
>                 URL: https://issues.apache.org/jira/browse/HBASE-5268
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>         Attachments: 5268.txt
>
>
> This is another part missing in the "wide row challenge".
> Currently entire families of a row can be deleted or individual columns or versions.
> There is no facility to mark multiple columns for deletion by column prefix.
> Turns out that be achieve with very little code (it's possible that I missed some of the new delete bloom filter code, so please review this thoroughly). I'll attach a patch soon, just working on some tests now.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira