You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Liyin Tang (JIRA)" <ji...@apache.org> on 2011/09/23 19:42:26 UTC

[jira] [Created] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Avoid top row seek by looking up bloomfilter
--------------------------------------------

                 Key: HBASE-4469
                 URL: https://issues.apache.org/jira/browse/HBASE-4469
             Project: HBase
          Issue Type: Improvement
            Reporter: Liyin Tang
            Assignee: Liyin Tang


The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row will be added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4469) Avoid top row seek by looking up ROWCOL bloomfilter

Posted by "Liyin Tang (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Liyin Tang updated HBASE-4469:
------------------------------

    Summary: Avoid top row seek by looking up ROWCOL bloomfilter  (was: Avoid top row seek by looking up bloomfilter)
    
> Avoid top row seek by looking up ROWCOL bloomfilter
> ---------------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.94.0
>
>         Attachments: HBASE-4469_1.patch
>
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Jonathan Gray (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127090#comment-13127090 ] 

Jonathan Gray commented on HBASE-4469:
--------------------------------------

(i'm not putting in 92 branch because this is feature)
                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.94.0
>
>         Attachments: HBASE-4469_1.patch
>
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Jonathan Gray (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-4469:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Got it, thanks Liyin!  Nice work!
                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.94.0
>
>         Attachments: HBASE-4469_1.patch
>
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4469:
-------------------------

    Hadoop Flags: Reviewed
          Status: Patch Available  (was: Open)
    
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Jonathan Gray (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127059#comment-13127059 ] 

Jonathan Gray commented on HBASE-4469:
--------------------------------------

Liyin, can you post the final patch to this JIRA?  I will commit.  Thanks!
                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Jonathan Gray (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127071#comment-13127071 ] 

Jonathan Gray commented on HBASE-4469:
--------------------------------------

Thanks Liyin.  Unfortunately because the RB integration isn't very tight, to follow Apache protocol, you need to attach the patch to the JIRA and select the radio button that assigns it to apache.

This also helps to ensure that there's no confusion about which version was committed and that we don't have a hard dependency on RB in any way.

It'll all be second nature before you know it :)
                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: HBASE-4469_1.patch
>
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Liyin Tang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126760#comment-13126760 ] 

Liyin Tang commented on HBASE-4469:
-----------------------------------

@stack. HBASE-4469 optimizes the top row seek if the ROWCOL Bloom filter is enabled.
And HBASE-4532  will optimize the top row seek if ROW or NONE Bloom filter is enabled.
So HBASE-4469 + HBASE-4532 will optimize all the cases.
 
And it is necessary to commit this first:)

                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122088#comment-13122088 ] 

Ted Yu commented on HBASE-4469:
-------------------------------

I don't see TestBlocksRead in the latest review.
                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Liyin Tang (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Liyin Tang updated HBASE-4469:
------------------------------

    Attachment: HBASE-4469_1.patch
    
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: HBASE-4469_1.patch
>
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122075#comment-13122075 ] 

jiraposter@reviews.apache.org commented on HBASE-4469:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2235/
-----------------------------------------------------------

Review request for hbase.


Summary
-------

The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family).
However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter.
We can take advantage of this factor to avoid seeking to the top of row.

Also, Update the TestBlocksRead unit tests. since most of block read count has dropped to a lower number.

Evaluation:
In TestSeekingOptimization, it saved 31.6% seek operation perviously.
Now it saves about 41.82% seek operation.
10% more seek operation.

======================
Before this diff:
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%

=====================
Apply this diff:
For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
=====================

Thanks Mikhail and Kannan's help and discussion.


This addresses bug HBASE-4469.
    https://issues.apache.org/jira/browse/HBASE-4469


Diffs
-----

  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 7b0b9e6 
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 8dd8a68 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java abccea4 

Diff: https://reviews.apache.org/r/2235/diff


Testing
-------

Run all the unit tests.
There are 2 unit tests failed with and without my change.
TestDistributedLogSplitting
TestHTablePool


Thanks,

Liyin


                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123378#comment-13123378 ] 

Ted Yu commented on HBASE-4469:
-------------------------------

+1 on patch.
Nice job.
                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126172#comment-13126172 ] 

jiraposter@reviews.apache.org commented on HBASE-4469:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2235/#review2541
-----------------------------------------------------------

Ship it!


Patch looks good.  Small.  Only works if bloom filters are already on?

- Michael


On 2011-10-06 17:17:23, Liyin wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2235/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-06 17:17:23)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family).
bq.  However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter.
bq.  We can take advantage of this factor to avoid seeking to the top of row.
bq.  
bq.  Also, Update the TestBlocksRead unit tests. since most of block read count has dropped to a lower number.
bq.  
bq.  Evaluation:
bq.  In TestSeekingOptimization, it saved 31.6% seek operation perviously.
bq.  Now it saves about 41.82% seek operation.
bq.  10% more seek operation.
bq.  
bq.  ======================
bq.  Before this diff:
bq.  For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
bq.  
bq.  =====================
bq.  Apply this diff:
bq.  For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
bq.  =====================
bq.  
bq.  Thanks Mikhail and Kannan's help and discussion.
bq.  
bq.  
bq.  This addresses bug HBASE-4469.
bq.      https://issues.apache.org/jira/browse/HBASE-4469
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 7b0b9e6 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 8dd8a68 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java abccea4 
bq.  
bq.  Diff: https://reviews.apache.org/r/2235/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Run all the unit tests.
bq.  There are 2 unit tests failed with and without my change.
bq.  TestDistributedLogSplitting
bq.  TestHTablePool
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Liyin
bq.  
bq.


                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Jonathan Gray (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127089#comment-13127089 ] 

Jonathan Gray commented on HBASE-4469:
--------------------------------------

What is the protocol now?  This needs to go into the fb-89 branch, so do I keep this JIRA open until that happens, or should we just add some fb-89-pending tag or something?
                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.94.0
>
>         Attachments: HBASE-4469_1.patch
>
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Liyin Tang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127066#comment-13127066 ] 

Liyin Tang commented on HBASE-4469:
-----------------------------------

Cool, I just downloaded the patch from review board (https://reviews.apache.org/r/2235/) and attached here:)
Thanks Jonathan.

                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: HBASE-4469_1.patch
>
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122289#comment-13122289 ] 

jiraposter@reviews.apache.org commented on HBASE-4469:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2235/#review2417
-----------------------------------------------------------


+1. 

Nice optimization Liyin. Changes look good.  [This is running nicely on our internal branch.]

- Kannan


On 2011-10-06 17:17:23, Liyin wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2235/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-06 17:17:23)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family).
bq.  However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter.
bq.  We can take advantage of this factor to avoid seeking to the top of row.
bq.  
bq.  Also, Update the TestBlocksRead unit tests. since most of block read count has dropped to a lower number.
bq.  
bq.  Evaluation:
bq.  In TestSeekingOptimization, it saved 31.6% seek operation perviously.
bq.  Now it saves about 41.82% seek operation.
bq.  10% more seek operation.
bq.  
bq.  ======================
bq.  Before this diff:
bq.  For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1714 (68.40%), savings: 31.60%
bq.  
bq.  =====================
bq.  Apply this diff:
bq.  For bloom=ROWCOL, compr=GZ total seeks without optimization: 2506, with optimization: 1458 (58.18%), savings: 41.82%
bq.  =====================
bq.  
bq.  Thanks Mikhail and Kannan's help and discussion.
bq.  
bq.  
bq.  This addresses bug HBASE-4469.
bq.      https://issues.apache.org/jira/browse/HBASE-4469
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 7b0b9e6 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 8dd8a68 
bq.    src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java abccea4 
bq.  
bq.  Diff: https://reviews.apache.org/r/2235/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Run all the unit tests.
bq.  There are 2 unit tests failed with and without my change.
bq.  TestDistributedLogSplitting
bq.  TestHTablePool
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Liyin
bq.  
bq.


                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Jonathan Gray (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Gray updated HBASE-4469:
---------------------------------

    Fix Version/s: 0.94.0

Committed to trunk.
                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.94.0
>
>         Attachments: HBASE-4469_1.patch
>
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Jonathan Gray (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126175#comment-13126175 ] 

Jonathan Gray commented on HBASE-4469:
--------------------------------------

@stack, yeah, this version only work if you have rowcol blooms enabled.  The generic version is going to be implemented over in HBASE-4532.
                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Liyin Tang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126336#comment-13126336 ] 

Liyin Tang commented on HBASE-4469:
-----------------------------------

HBASE-4532 will enable delete family Bloom filter only when Row or None Bloom filter is enabled.
Because if there is a delete family the store file, the RowCol Bloom filter has already had this information.

                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126651#comment-13126651 ] 

stack commented on HBASE-4469:
------------------------------

OK. I was confused. I'm +0 on this patch (since I am not familiar with what is going on here – it looks innocuous enough on review). Jon you going to commit?
                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Nicolas Spiegelberg (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122232#comment-13122232 ] 

Nicolas Spiegelberg commented on HBASE-4469:
--------------------------------------------

+1. lgtm
                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Liyin Tang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127114#comment-13127114 ] 

Liyin Tang commented on HBASE-4469:
-----------------------------------

@Jonathan, 
For this jira specifically, it has been committed to 89-fb internal branch before cutting the public 89-fb branch.
So it should already in the public 89-fb right now.



                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.94.0
>
>         Attachments: HBASE-4469_1.patch
>
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128383#comment-13128383 ] 

Hudson commented on HBASE-4469:
-------------------------------

Integrated in HBase-TRUNK #2325 (See [https://builds.apache.org/job/HBase-TRUNK/2325/])
    HBASE-4469  Avoid top row seek by looking up bloomfilter (liyin via jgray)

jgray : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java

                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.94.0
>
>         Attachments: HBASE-4469_1.patch
>
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Liyin Tang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122161#comment-13122161 ] 

Liyin Tang commented on HBASE-4469:
-----------------------------------

Yes, I didn't change that unit tests TestBlocksRead, which is passed successfully. 

                
> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4469) Avoid top row seek by looking up bloomfilter

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kannan Muthukkaruppan updated HBASE-4469:
-----------------------------------------

    Description: 
The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.



  was:
The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row will be added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.




> Avoid top row seek by looking up bloomfilter
> --------------------------------------------
>
>                 Key: HBASE-4469
>                 URL: https://issues.apache.org/jira/browse/HBASE-4469
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>
> The problem is that when seeking for the row/col in the hfile, we will go to top of the row in order to check for row delete marker (delete family). However, if the bloomfilter is enabled for the column family, then if a delete family operation is done on a row, the row is already being added to bloomfilter. We can take advantage of this factor to avoid seeking to the top of row.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira