You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "mikhail (JIRA)" <ji...@apache.org> on 2011/04/10 23:41:05 UTC

[jira] [Created] (HBASE-3763) Splitting Bloom filters into multiple meta blocks and loading those blocks on demand to avoid blocking on large Bloom filter loads at read time

Splitting Bloom filters into multiple meta blocks and loading those blocks on demand to avoid blocking on large Bloom filter loads at read time
-----------------------------------------------------------------------------------------------------------------------------------------------

                 Key: HBASE-3763
                 URL: https://issues.apache.org/jira/browse/HBASE-3763
             Project: HBase
          Issue Type: Improvement
          Components: regionserver
    Affects Versions: 0.90.2, 0.90.1, 0.90.0, 0.89.20100924
            Reporter: mikhail
            Priority: Minor
             Fix For: 0.89.20100924


Adding a way to save HBase Bloom filters into an array of Meta blocks instead of one big Meta block, and load only the blocks required to answer a query. This behavior is controlled by the io.storefile.bloom.lazy configuration option, which is set to false by default. Existing StoreFiles with single-block Bloom filters are handled the same way as before.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3763) Add Bloom Block Index Support

Posted by "Mikhail Bautin (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail Bautin resolved HBASE-3763.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.94.0
                   0.92.0

Resolved as HBASE-3857 has been committed.

> Add Bloom Block Index Support
> -----------------------------
>
>                 Key: HBASE-3763
>                 URL: https://issues.apache.org/jira/browse/HBASE-3763
>             Project: HBase
>          Issue Type: Improvement
>          Components: io, regionserver
>    Affects Versions: 0.89.20100924, 0.90.0, 0.90.1, 0.90.2
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>            Priority: Minor
>              Labels: hbase, performance
>             Fix For: 0.92.0, 0.94.0, 0.89.20100924
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Add a way to save HBase Bloom filters into an array of Meta blocks instead of one big Meta block, and load only the blocks required to answer a query.  This will allow us faster bloom load times for large StoreFiles & pave the path for adding Bloom Filter support to HFileOutputFormat bulk load.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3763) Add Bloom Block Index Support

Posted by "Mikhail Bautin (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036706#comment-13036706 ] 

Mikhail Bautin commented on HBASE-3763:
---------------------------------------

@stack, Joydeep: we thought it would be good to keep the core HFile format v2 changes separate from the two features that depend on it (multi-level block indexes and compound Bloom filters), so that even though we have one design doc we can still have three separate JIRAs.

Regarding the question about keeping the Bloom filter in memory: in our current design/implementation it will be cached and kept in memory as long as there is enough room in the block cache. The Bloom filter index is loaded at open time, but the individual chunks are loaded and cached as needed. However, we are adding separate configuration settings to cache Bloom filter chunks (and block index chunks) at write time, extending the existing cache-on-write setting for data blocks, so that the effect will exactly as Joydeep described.

> Add Bloom Block Index Support
> -----------------------------
>
>                 Key: HBASE-3763
>                 URL: https://issues.apache.org/jira/browse/HBASE-3763
>             Project: HBase
>          Issue Type: Improvement
>          Components: io, regionserver
>    Affects Versions: 0.89.20100924, 0.90.0, 0.90.1, 0.90.2
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>            Priority: Minor
>              Labels: hbase, performance
>             Fix For: 0.89.20100924
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Add a way to save HBase Bloom filters into an array of Meta blocks instead of one big Meta block, and load only the blocks required to answer a query.  This will allow us faster bloom load times for large StoreFiles & pave the path for adding Bloom Filter support to HFileOutputFormat bulk load.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3763) Add Bloom Block Index Support

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019223#comment-13019223 ] 

Nicolas Spiegelberg commented on HBASE-3763:
--------------------------------------------

@stack: we ran into a problem where our bloom sizes were getting quite substantial (100 MB.  Believe it or not, blooms still make sense here). When this is not in the LRU cache, read requests stall until the entire bloom is loaded into memory.  Sometimes, this can be a non-local read.  If we can do a block index for blooms and only have to load a 64kb shard, our read stalls will severely diminish.

> Add Bloom Block Index Support
> -----------------------------
>
>                 Key: HBASE-3763
>                 URL: https://issues.apache.org/jira/browse/HBASE-3763
>             Project: HBase
>          Issue Type: Improvement
>          Components: io, regionserver
>    Affects Versions: 0.89.20100924, 0.90.0, 0.90.1, 0.90.2
>            Reporter: mikhail
>            Assignee: mikhail
>            Priority: Minor
>              Labels: hbase, performance
>             Fix For: 0.89.20100924
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Add a way to save HBase Bloom filters into an array of Meta blocks instead of one big Meta block, and load only the blocks required to answer a query.  This will allow us faster bloom load times for large StoreFiles & pave the path for adding Bloom Filter support to HFileOutputFormat bulk load.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3763) Add Bloom Block Index Support

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036683#comment-13036683 ] 

stack commented on HBASE-3763:
------------------------------

@Mikhail Do you want to close this issue?  Your hfile2 subsumes this one?  I don't recall your design making note of Joydeeps second suggestion?  It seems like a nice little optimization.

> Add Bloom Block Index Support
> -----------------------------
>
>                 Key: HBASE-3763
>                 URL: https://issues.apache.org/jira/browse/HBASE-3763
>             Project: HBase
>          Issue Type: Improvement
>          Components: io, regionserver
>    Affects Versions: 0.89.20100924, 0.90.0, 0.90.1, 0.90.2
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>            Priority: Minor
>              Labels: hbase, performance
>             Fix For: 0.89.20100924
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Add a way to save HBase Bloom filters into an array of Meta blocks instead of one big Meta block, and load only the blocks required to answer a query.  This will allow us faster bloom load times for large StoreFiles & pave the path for adding Bloom Filter support to HFileOutputFormat bulk load.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3763) Add Bloom Block Index Support

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030242#comment-13030242 ] 

Joydeep Sen Sarma commented on HBASE-3763:
------------------------------------------

Dhruba pointed me to some of these jiras.

one quick comment is that _if_ the intention is to keep the filters pinned in memory - then we can convert the load at read time to:
- load at startup time as quickly as possible
- keep the filter pinned in memory when writing out new hfile (never have to read it in)

this would also take out filter reads from client read path.

> Add Bloom Block Index Support
> -----------------------------
>
>                 Key: HBASE-3763
>                 URL: https://issues.apache.org/jira/browse/HBASE-3763
>             Project: HBase
>          Issue Type: Improvement
>          Components: io, regionserver
>    Affects Versions: 0.89.20100924, 0.90.0, 0.90.1, 0.90.2
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>            Priority: Minor
>              Labels: hbase, performance
>             Fix For: 0.89.20100924
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Add a way to save HBase Bloom filters into an array of Meta blocks instead of one big Meta block, and load only the blocks required to answer a query.  This will allow us faster bloom load times for large StoreFiles & pave the path for adding Bloom Filter support to HFileOutputFormat bulk load.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3763) Add Bloom Block Index Support

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Spiegelberg updated HBASE-3763:
---------------------------------------

    Component/s: io
    Description: Add a way to save HBase Bloom filters into an array of Meta blocks instead of one big Meta block, and load only the blocks required to answer a query.  This will allow us faster bloom load times for large StoreFiles & pave the path for adding Bloom Filter support to HFileOutputFormat bulk load.  (was: Adding a way to save HBase Bloom filters into an array of Meta blocks instead of one big Meta block, and load only the blocks required to answer a query. This behavior is controlled by the io.storefile.bloom.lazy configuration option, which is set to false by default. Existing StoreFiles with single-block Bloom filters are handled the same way as before.)
       Assignee: mikhail
        Summary: Add Bloom Block Index Support  (was: Splitting Bloom filters into multiple meta blocks and loading those blocks on demand to avoid blocking on large Bloom filter loads at read time)

> Add Bloom Block Index Support
> -----------------------------
>
>                 Key: HBASE-3763
>                 URL: https://issues.apache.org/jira/browse/HBASE-3763
>             Project: HBase
>          Issue Type: Improvement
>          Components: io, regionserver
>    Affects Versions: 0.89.20100924, 0.90.0, 0.90.1, 0.90.2
>            Reporter: mikhail
>            Assignee: mikhail
>            Priority: Minor
>              Labels: hbase, performance
>             Fix For: 0.89.20100924
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Add a way to save HBase Bloom filters into an array of Meta blocks instead of one big Meta block, and load only the blocks required to answer a query.  This will allow us faster bloom load times for large StoreFiles & pave the path for adding Bloom Filter support to HFileOutputFormat bulk load.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3763) Add Bloom Block Index Support

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019176#comment-13019176 ] 

stack commented on HBASE-3763:
------------------------------

So we'd load and unload blooms as we went?

> Add Bloom Block Index Support
> -----------------------------
>
>                 Key: HBASE-3763
>                 URL: https://issues.apache.org/jira/browse/HBASE-3763
>             Project: HBase
>          Issue Type: Improvement
>          Components: io, regionserver
>    Affects Versions: 0.89.20100924, 0.90.0, 0.90.1, 0.90.2
>            Reporter: mikhail
>            Assignee: mikhail
>            Priority: Minor
>              Labels: hbase, performance
>             Fix For: 0.89.20100924
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> Add a way to save HBase Bloom filters into an array of Meta blocks instead of one big Meta block, and load only the blocks required to answer a query.  This will allow us faster bloom load times for large StoreFiles & pave the path for adding Bloom Filter support to HFileOutputFormat bulk load.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira