You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org> on 2010/04/13 23:55:51 UTC

[jira] Created: (HBASE-2439) HBase can get stuck if updates to META are blocked

HBase can get stuck if updates to META are blocked
--------------------------------------------------

                 Key: HBASE-2439
                 URL: https://issues.apache.org/jira/browse/HBASE-2439
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: Kannan Muthukkaruppan


(We noticed this on a import-style test in a small test cluster.)

If compactions are running slow, and we are doing a lot of region splits, then, since META has a much smaller hard-coded memstore flush size (16KB), it quickly accumulates lots of store files. Once this exceeds "hbase.hstore.blockingStoreFiles", flushes to META become no-ops. This causes METAs memstore footprint to grow. Once this exceeds "hbase.hregion.memstore.block.multiplier * 16KB", we block further updates to META.

In my test setup:
  hbase.hregion.memstore.block.multiplier = 4.
and,
  hbase.hstore.blockingStoreFiles = 15.

And we saw messages of the form:

{code}
2010-04-09 18:37:39,539 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 23 on 60020' on region .META.,,1: memstore size 64.2k is >= than blocking 64.0k size
{code}

Now, if around the same time the CompactSplitThread does a compaction and determines it is going split the region. As part of finishing the split, it wants to update META about the daughter regions. 

It'll end up waiting for the META to become unblocked. The single CompactSplitThread is now held up, and no further compactions can proceed.  META's compaction request is itself blocked because the compaction queue will never get cleared.

This essentially creates a deadlock and the region server is able to not progress any further. Eventually, each region server's CompactSplitThread ends up in the same state.



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Resolved: (HBASE-2439) HBase can get stuck if updates to META are blocked

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-2439.
--------------------------

     Hadoop Flags: [Reviewed]
    Fix Version/s: 0.20.4
                   0.20.5
                   0.21.0
       Resolution: Fixed

Applied to 0.20 branch as is.  Applied to 0.20_pre_durability and TRUNK w/o the change to table descriptor as per Todd suggestion (In former to minimize change and in latter because patch failed since limit had already been removed).  Thanks for the patch Kannan.

> HBase can get stuck if updates to META are blocked
> --------------------------------------------------
>
>                 Key: HBASE-2439
>                 URL: https://issues.apache.org/jira/browse/HBASE-2439
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Kannan Muthukkaruppan
>             Fix For: 0.20.4, 0.20.5, 0.21.0
>
>         Attachments: 2439_0.20_dont_block_meta.txt
>
>
> (We noticed this on a import-style test in a small test cluster.)
> If compactions are running slow, and we are doing a lot of region splits, then, since META has a much smaller hard-coded memstore flush size (16KB), it quickly accumulates lots of store files. Once this exceeds "hbase.hstore.blockingStoreFiles", flushes to META become no-ops. This causes METAs memstore footprint to grow. Once this exceeds "hbase.hregion.memstore.block.multiplier * 16KB", we block further updates to META.
> In my test setup:
>   hbase.hregion.memstore.block.multiplier = 4.
> and,
>   hbase.hstore.blockingStoreFiles = 15.
> And we saw messages of the form:
> {code}
> 2010-04-09 18:37:39,539 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 23 on 60020' on region .META.,,1: memstore size 64.2k is >= than blocking 64.0k size
> {code}
> Now, if around the same time the CompactSplitThread does a compaction and determines it is going split the region. As part of finishing the split, it wants to update META about the daughter regions. 
> It'll end up waiting for the META to become unblocked. The single CompactSplitThread is now held up, and no further compactions can proceed.  META's compaction request is itself blocked because the compaction queue will never get cleared.
> This essentially creates a deadlock and the region server is able to not progress any further. Eventually, each region server's CompactSplitThread ends up in the same state.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira