You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Matt Corgan (JIRA)" <ji...@apache.org> on 2011/05/26 20:35:47 UTC

[jira] [Created] (HBASE-3927) display total uncompressed byte size of a region in web UI

display total uncompressed byte size of a region in web UI
----------------------------------------------------------

                 Key: HBASE-3927
                 URL: https://issues.apache.org/jira/browse/HBASE-3927
             Project: HBase
          Issue Type: Improvement
          Components: metrics
            Reporter: Matt Corgan
            Priority: Minor


The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.

There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).

This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-3927:
--------------------------

    Attachment: 3927.txt

This version adds compression ratio in HServerLoad.toString()

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-3927:
--------------------------

    Attachment:     (was: 3927.txt)

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Matt Corgan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040881#comment-13040881 ] 

Matt Corgan commented on HBASE-3927:
------------------------------------

This looks great Ted.  Thanks.

Should we take the opportunity to change storefileIndexSizeMB to storefileIndexSizeKB?  Indexes are very often under 1MB, so it's not very useful as is.  

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-3927:
--------------------------

    Attachment:     (was: 3927.txt)

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-3927:
--------------------------

    Attachment:     (was: 3927.txt)

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-3927:
--------------------------

    Attachment: 3927.txt

This version limits the digits after decimal point to 4 for compression ratio.

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-3927:
--------------------------

    Attachment: 3927.txt

This patch is based on trunk codebase.

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu reassigned HBASE-3927:
-----------------------------

    Assignee: Ted Yu

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-3927:
--------------------------

    Attachment: regionserver-showing-compression-ratio.png

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040346#comment-13040346 ] 

Ted Yu commented on HBASE-3927:
-------------------------------

I discussed this with Karthick.
In Store.java, we maintain storeSize:
{code}
        this.storeSize += r.length();
{code}
The call to length() is delegated to HFile#Reader#length().
We can add a new method, HFile#Reader#getTotalUncompressedBytes e.g. which exposes HFile#Reader#trailer.totalUncompressedBytes.
This way we can maintain both measures in Store.

@Matt:
Please elaborate the description in your first paragraph.
Thanks

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Priority: Minor
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041404#comment-13041404 ] 

stack commented on HBASE-3927:
------------------------------

This is for 0.92 only?  If so, should be fine changing the serialization of HServerLoad?  Otherwise +1 on patch.  Commit it Ted (you might have to wait a few more days on your commit bit).

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041643#comment-13041643 ] 

Ted Yu commented on HBASE-3927:
-------------------------------

Thanks for the hint, Stack.
Looking at RegionLoad.readFields(), I am a little confused by the handling of RegionLoad.VERSION.
getVersion() just returns the static VERSION, making the subsequent version check almost meaningless.
I plan to increase RegionLoad.VERSION and persist the version byte for TRUNK.

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041646#comment-13041646 ] 

Ted Yu commented on HBASE-3927:
-------------------------------

HServerLoad.VERSION isn't serialized, either.

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040405#comment-13040405 ] 

Ted Yu commented on HBASE-3927:
-------------------------------

I will wait for other developers' comment before producing a patch.

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041625#comment-13041625 ] 

stack commented on HBASE-3927:
------------------------------

But it changes the serialization of HServerLoad in a way that does not self-migrate.  Is HSL versioned?  If so, can you make it so HSL can deserialize based off the HSL version?   Else, we can just commit this to 0.92?

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Matt Corgan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040404#comment-13040404 ] 

Matt Corgan commented on HBASE-3927:
------------------------------------

Ted - I think the problem I'm most often seeing on the user list is that people want the default 64K block size, but after they enable compression they don't raise the block size to compensate for the compression.  In many cases it's easy to obtain compression of 10x or better, so the blocks on disk are ~6K, which is smaller than anyone wants.

It's also true that data with large keys and small values (like an inverted index) tends to compress well.  Those big keys also necessitate relatively large block cache entries.  Because the block index has an entry for every block, it can get overly large when a user has large keys and small compressed blocks.

Exposing this metric just a way to remind unsuspecting users that block size is calculated based on uncompressed size, rather than compressed disk size which drives region splits.  It should also make it easier to figure out how effective different compression algorithms are, how big your compressed block size is, what percent of your data you can fit in block cache, etc..  

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Priority: Minor
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-3927:
--------------------------

    Attachment: 3927-v2.txt

Second version serializes/deserializes version numbers for HSL and RegionLoad.

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927-v2.txt, 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040854#comment-13040854 ] 

Andrew Purtell commented on HBASE-3927:
---------------------------------------

+1

Minor nit, I think the compression ratio shows too many insignificant digits. 

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040884#comment-13040884 ] 

Ted Yu commented on HBASE-3927:
-------------------------------

I tend to agree with Matt.
But Facebook may have large indexes. I think that is part of the reason for HBASE-3856.
Also, getStorefileIndexSizeMB() is currently used by AvoUtil and StorageClusterStatusResource (in rest package).

Shall we change the display for HServerLoad.toString() to KB and keep the other references the same?

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041406#comment-13041406 ] 

Ted Yu commented on HBASE-3927:
-------------------------------

The attached image was produced based on 0.90.3

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Karthick Sankarachary (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040519#comment-13040519 ] 

Karthick Sankarachary commented on HBASE-3927:
----------------------------------------------

More often than not, the uncompressed bytes should be equal to the "hfile.min.blocksize.size" setting, if I understand it correctly. Typically, the {{HFile#Writer}} will close a block if its {{checkBlockBoundary}} method throws an exception, which happens when the current block's size goes over. I believe the only hfile block that can potentially have fewer (uncompressed) bytes is the last one (which was current at the time of close). If so, it would be nicer to expose the compression ratio (along with the total compressed bytes) in the web UI.

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040910#comment-13040910 ] 

Andrew Purtell commented on HBASE-3927:
---------------------------------------

@Ted: I agree with Matt also. However, commit what you have here and then address Matt's comment with another JIRA? And change all MB to KB?

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-3927.
--------------------------

       Resolution: Fixed
    Fix Version/s: 0.92.0
     Hadoop Flags: [Reviewed]

Ran tests and they are passing.  Committed to TRUNK.  Thanks Ted (The way VERSION work is that you might encounter a serialized object that is of a version that is before yours.  When deserializing you won't be expecting to find fields that were added in your current version)

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>             Fix For: 0.92.0
>
>         Attachments: 3927-v2.txt, 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049879#comment-13049879 ] 

Jean-Daniel Cryans commented on HBASE-3927:
-------------------------------------------

+1 if it doesn't break any tests, also it should be targeted for 0.92

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927-v2.txt, 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-3927:
--------------------------

    Attachment: 3927.txt

This version adjusts Store.FIXED_OVERHEAD

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3927) display total uncompressed byte size of a region in web UI

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040972#comment-13040972 ] 

Ted Yu commented on HBASE-3927:
-------------------------------

@Andrew:
That's fine. 

> display total uncompressed byte size of a region in web UI
> ----------------------------------------------------------
>
>                 Key: HBASE-3927
>                 URL: https://issues.apache.org/jira/browse/HBASE-3927
>             Project: HBase
>          Issue Type: Improvement
>          Components: metrics
>            Reporter: Matt Corgan
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: 3927.txt, regionserver-showing-compression-ratio.png
>
>
> The decision to split data blocks when flushing and compacting is made based on the uncompressed data size which can often lead to compressed disk blocks that are a fraction of the intended 64 KB (default).  This often leads to a larger number of blocks and index entries than expected and can cause block indexes to take up GB of memory.
> There is already a "long totalUncompressedBytes" written to the HFileTrailer.  It would be nice to expose this in the web UI to make it easier to calculate the compression ratio and then raise the block size appropriately (not necessarily to get it back to 64K).
> This should probably be added wherever the other HFile metrics are: RegionLoad.createRegions(..), and HServerLoad.  HServerLoad is a Writable, so it may break serialization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira