You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2011/03/22 00:29:05 UTC

[jira] [Created] (HBASE-3680) Publish more metrics about mslab

Publish more metrics about mslab
--------------------------------

                 Key: HBASE-3680
                 URL: https://issues.apache.org/jira/browse/HBASE-3680
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 0.90.1
            Reporter: Jean-Daniel Cryans
             Fix For: 0.92.0


We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.

We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3680) Publish more metrics about mslab

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141489#comment-13141489 ] 

stack commented on HBASE-3680:
------------------------------

Forget it.  I reverted this patch from TRUNK.  It doesn't apply to 0.92.  Metrics have changed there in a way that is not similar to TRUNK.   Will look into this later.  REVERT in the time being.
                
> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: hbase-3680.txt, hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3680) Publish more metrics about mslab

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3680:
-------------------------

    Attachment: hbase-3680.txt

Reuploading same patch so can get patch build to work.
                
> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: hbase-3680.txt, hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3680) Publish more metrics about mslab

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3680:
-------------------------

    Fix Version/s:     (was: 0.92.2)
                   0.92.3
    
> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.3
>
>         Attachments: hbase-3680.txt, hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3680) Publish more metrics about mslab

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3680:
-------------------------

    Status: Open  (was: Patch Available)
    
> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: hbase-3680.txt, hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3680) Publish more metrics about mslab

Posted by "Todd Lipcon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-3680:
-------------------------------

    Attachment: hbase-3680.txt

Here's a patch which adds the metric. I tested by firing up a standalone instance and using jconsole to check that it showed the right value in JMX.
                
> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-3680) Publish more metrics about mslab

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans reassigned HBASE-3680:
-----------------------------------------

    Assignee: Todd Lipcon

Hoping Todd can take a quick look.

> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3680) Publish more metrics about mslab

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141482#comment-13141482 ] 

stack commented on HBASE-3680:
------------------------------

I committed this patch already by mistake as part of this commit:

{code}
------------------------------------------------------------------------
r1195833 | stack | 2011-10-31 22:36:12 -0700 (Mon, 31 Oct 2011) | 1 line

HBASE-4703 Improvements in tests
{code}

I did all but the change to MemStore.java.  Let me commit that now.
                
> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: hbase-3680.txt, hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3680) Publish more metrics about mslab

Posted by "Todd Lipcon (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-3680:
-------------------------------

    Fix Version/s: 0.92.0
           Status: Patch Available  (was: Open)
    
> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3680) Publish more metrics about mslab

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3680:
-------------------------

    Status: Patch Available  (was: Open)

Trying patch build.
                
> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3680) Publish more metrics about mslab

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141014#comment-13141014 ] 

Hadoop QA commented on HBASE-3680:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12501741/hbase-3680.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/120//console

This message is automatically generated.
                
> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: hbase-3680.txt, hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3680) Publish more metrics about mslab

Posted by "Chris Tarnas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13018919#comment-13018919 ] 

Chris Tarnas commented on HBASE-3680:
-------------------------------------

I have been seeing similar problems with MSLAB enabled. When under heavy read/write workloads we've seen regionservers OOME and some, what appear to be, long GC pauses. When I get a chance i'll do more testing with GC logging re-enabled and MSLAB enabled. 

> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3680) Publish more metrics about mslab

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141928#comment-13141928 ] 

Hudson commented on HBASE-3680:
-------------------------------

Integrated in HBase-TRUNK #2398 (See [https://builds.apache.org/job/HBase-TRUNK/2398/])
    HBASE-3680 Publish more metrics about mslab; REVERT

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java

                
> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: hbase-3680.txt, hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3680) Publish more metrics about mslab

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139768#comment-13139768 ] 

stack commented on HBASE-3680:
------------------------------

+1
                
> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3680) Publish more metrics about mslab

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141751#comment-13141751 ] 

Hudson commented on HBASE-3680:
-------------------------------

Integrated in HBase-TRUNK #2397 (See [https://builds.apache.org/job/HBase-TRUNK/2397/])
    HBASE-3680 Publish more metrics about mslab; REVERTED
HBASE-3680 Publish more metrics about mslab
HBASE-3680 Publish more metrics about mslab

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStoreLAB.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/RegionServerMetrics.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMemStoreLAB.java

stack : 
Files : 
* /hbase/trunk/CHANGES.txt

stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java

                
> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: hbase-3680.txt, hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3680) Publish more metrics about mslab

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3680:
-------------------------

    Status: Patch Available  (was: Open)
    
> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: hbase-3680.txt, hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-3680) Publish more metrics about mslab

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3680:
-------------------------

    Status: Open  (was: Patch Available)
    
> Publish more metrics about mslab
> --------------------------------
>
>                 Key: HBASE-3680
>                 URL: https://issues.apache.org/jira/browse/HBASE-3680
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: hbase-3680.txt
>
>
> We have been using mslab on all our clusters for a while now and it seems it tends to OOME or send us into GC loops of death a lot more than it used to. For example, one RS with mslab enabled and 7GB of heap died out of OOME this afternoon; it had .55GB in the block cache and 2.03GB in the memstores which doesn't account for much... but it could be that because of mslab a lot of space was lost in those incomplete 2MB blocks and without metrics we can't really tell. Compactions were running at the time of the OOME and I see block cache activity. The average load on that cluster is 531.
> We should at least publish the total size of all those blocks and maybe even take actions based on that (like force flushing).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira