You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2010/01/01 01:28:29 UTC

[jira] Created: (HBASE-2087) The wait on compaction because "Too many store files" holds up all flushing

The wait on compaction because "Too many store files" holds up all flushing
---------------------------------------------------------------------------

                 Key: HBASE-2087
                 URL: https://issues.apache.org/jira/browse/HBASE-2087
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: stack


The method MemStoreFlusher#checkStoreFileCount is called from flushRegion.  flushRegion is called by MemStoreFlusher#run thread.  If the checkStoreFileCount finds too many store files, it'll stick around waiting on a compaction to happen.  While its hanging, the MemStoreFlusher#run is held up.  No other region can flush.  Meantime WALs will be rolling and memory will be accumulating writes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2087) The wait on compaction because "Too many store files" holds up all flushing

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796046#action_12796046 ] 

stack commented on HBASE-2087:
------------------------------

@J-D: "...flushing incomplete memstores is highly inefficent.."  ... yeah but if the edit is old, its probably worth the flush if you take a systems view.  And this issue is about something else anyway, never holding up flushes.  Should we open a blanket issue in which we discuss undoing "compensating" changes now hdfs has a working sync; i.e.undo all the weird stuff we did to try and minimize losing edits when there was no working sync.

> The wait on compaction because "Too many store files" holds up all flushing
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-2087
>                 URL: https://issues.apache.org/jira/browse/HBASE-2087
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> The method MemStoreFlusher#checkStoreFileCount is called from flushRegion.  flushRegion is called by MemStoreFlusher#run thread.  If the checkStoreFileCount finds too many store files, it'll stick around waiting on a compaction to happen.  While its hanging, the MemStoreFlusher#run is held up.  No other region can flush.  Meantime WALs will be rolling and memory will be accumulating writes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2087) The wait on compaction because "Too many store files" holds up all flushing

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-2087:
-------------------------

    Fix Version/s: 0.20.4

Moving into 0.20.4.

> The wait on compaction because "Too many store files" holds up all flushing
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-2087
>                 URL: https://issues.apache.org/jira/browse/HBASE-2087
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.4
>
>
> The method MemStoreFlusher#checkStoreFileCount is called from flushRegion.  flushRegion is called by MemStoreFlusher#run thread.  If the checkStoreFileCount finds too many store files, it'll stick around waiting on a compaction to happen.  While its hanging, the MemStoreFlusher#run is held up.  No other region can flush.  Meantime WALs will be rolling and memory will be accumulating writes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-2087) The wait on compaction because "Too many store files" holds up all flushing

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans resolved HBASE-2087.
---------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.21.0
         Assignee: Jean-Daniel Cryans
     Hadoop Flags: [Reviewed]

Committed to branch and trunk with a 1000ms sleep, thanks for the review Stack!

> The wait on compaction because "Too many store files" holds up all flushing
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-2087
>                 URL: https://issues.apache.org/jira/browse/HBASE-2087
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.4, 0.21.0
>
>         Attachments: HBASE-2087.patch
>
>
> The method MemStoreFlusher#checkStoreFileCount is called from flushRegion.  flushRegion is called by MemStoreFlusher#run thread.  If the checkStoreFileCount finds too many store files, it'll stick around waiting on a compaction to happen.  While its hanging, the MemStoreFlusher#run is held up.  No other region can flush.  Meantime WALs will be rolling and memory will be accumulating writes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2087) The wait on compaction because "Too many store files" holds up all flushing

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796057#action_12796057 ] 

Andrew Purtell commented on HBASE-2087:
---------------------------------------

bq. Should we open a blanket issue in which we discuss undoing "compensating" changes now hdfs has a working sync

+1

Like we did with the compaction limiting thread and region server "safe mode" after the transition to 0.20.

> The wait on compaction because "Too many store files" holds up all flushing
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-2087
>                 URL: https://issues.apache.org/jira/browse/HBASE-2087
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> The method MemStoreFlusher#checkStoreFileCount is called from flushRegion.  flushRegion is called by MemStoreFlusher#run thread.  If the checkStoreFileCount finds too many store files, it'll stick around waiting on a compaction to happen.  While its hanging, the MemStoreFlusher#run is held up.  No other region can flush.  Meantime WALs will be rolling and memory will be accumulating writes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2087) The wait on compaction because "Too many store files" holds up all flushing

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796271#action_12796271 ] 

stack commented on HBASE-2087:
------------------------------

The problem this issue covers is case where a regionserver has say 1k regions and it so happens that one of these is over the store file upper limit.  As is all flushing on the regionserver is held up because one region is over the limit.   Because no flushing we will block writes and so on

> The wait on compaction because "Too many store files" holds up all flushing
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-2087
>                 URL: https://issues.apache.org/jira/browse/HBASE-2087
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> The method MemStoreFlusher#checkStoreFileCount is called from flushRegion.  flushRegion is called by MemStoreFlusher#run thread.  If the checkStoreFileCount finds too many store files, it'll stick around waiting on a compaction to happen.  While its hanging, the MemStoreFlusher#run is held up.  No other region can flush.  Meantime WALs will be rolling and memory will be accumulating writes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2087) The wait on compaction because "Too many store files" holds up all flushing

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796078#action_12796078 ] 

Jean-Daniel Cryans commented on HBASE-2087:
-------------------------------------------

bq. And this issue is about something else anyway, never holding up flushes

As I said in my first comment, it's either too much WALs or too much store files. If we let all flushes go then we are overrun by store files. If we force flush memstores to be able to roll WALs then we easily create too much store files. We have seen stores that needed to compact 100 files and this is why we have a limit.

So, I question the feasibility of this jira.

In the particular case of WALs waiting on flushes waiting on too many store files, what I said is that it's by setting a very low number of WALs that we easily hit the limit. Setting it to a higher number means less chance of hitting this jira's problem, hence making it invalid?

> The wait on compaction because "Too many store files" holds up all flushing
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-2087
>                 URL: https://issues.apache.org/jira/browse/HBASE-2087
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> The method MemStoreFlusher#checkStoreFileCount is called from flushRegion.  flushRegion is called by MemStoreFlusher#run thread.  If the checkStoreFileCount finds too many store files, it'll stick around waiting on a compaction to happen.  While its hanging, the MemStoreFlusher#run is held up.  No other region can flush.  Meantime WALs will be rolling and memory will be accumulating writes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2087) The wait on compaction because "Too many store files" holds up all flushing

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796283#action_12796283 ] 

Jean-Daniel Cryans commented on HBASE-2087:
-------------------------------------------

Oh right I didn't see it like that. Yes we don't want to hold flushes for every region, just those concerned.

> The wait on compaction because "Too many store files" holds up all flushing
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-2087
>                 URL: https://issues.apache.org/jira/browse/HBASE-2087
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> The method MemStoreFlusher#checkStoreFileCount is called from flushRegion.  flushRegion is called by MemStoreFlusher#run thread.  If the checkStoreFileCount finds too many store files, it'll stick around waiting on a compaction to happen.  While its hanging, the MemStoreFlusher#run is held up.  No other region can flush.  Meantime WALs will be rolling and memory will be accumulating writes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2087) The wait on compaction because "Too many store files" holds up all flushing

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835521#action_12835521 ] 

stack commented on HBASE-2087:
------------------------------

I was going to explore blocking the problematic store only by removing its flush request from the flush queue readding it later after the timer elapses (or after compaction completes)

> The wait on compaction because "Too many store files" holds up all flushing
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-2087
>                 URL: https://issues.apache.org/jira/browse/HBASE-2087
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.20.4
>
>
> The method MemStoreFlusher#checkStoreFileCount is called from flushRegion.  flushRegion is called by MemStoreFlusher#run thread.  If the checkStoreFileCount finds too many store files, it'll stick around waiting on a compaction to happen.  While its hanging, the MemStoreFlusher#run is held up.  No other region can flush.  Meantime WALs will be rolling and memory will be accumulating writes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2087) The wait on compaction because "Too many store files" holds up all flushing

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795715#action_12795715 ] 

Jean-Daniel Cryans commented on HBASE-2087:
-------------------------------------------

So either we have too much WALs or too much store files right? Like I said in HBASE-2053, our WAL is set very small so that master splits fast and we don't lose data. In 0.21 we won't lose data so speeding up the spit time then set a higher/bigger WAL would solve this problem?

> The wait on compaction because "Too many store files" holds up all flushing
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-2087
>                 URL: https://issues.apache.org/jira/browse/HBASE-2087
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> The method MemStoreFlusher#checkStoreFileCount is called from flushRegion.  flushRegion is called by MemStoreFlusher#run thread.  If the checkStoreFileCount finds too many store files, it'll stick around waiting on a compaction to happen.  While its hanging, the MemStoreFlusher#run is held up.  No other region can flush.  Meantime WALs will be rolling and memory will be accumulating writes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2087) The wait on compaction because "Too many store files" holds up all flushing

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795716#action_12795716 ] 

Jean-Daniel Cryans commented on HBASE-2087:
-------------------------------------------

Another thing to keep in mind, flushing incomplete memstores is highly inefficient. Let's say you want to drop the number of WALs by flushing 10 regions. Those are probably not full, maybe 2MB or 10MB big, but they still take time to flush and clogger HDFS with even more new files. Those files then have to be compacted, it's even worse if we hit the "Too many store files" problem and it's likely that one causes the other.

> The wait on compaction because "Too many store files" holds up all flushing
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-2087
>                 URL: https://issues.apache.org/jira/browse/HBASE-2087
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> The method MemStoreFlusher#checkStoreFileCount is called from flushRegion.  flushRegion is called by MemStoreFlusher#run thread.  If the checkStoreFileCount finds too many store files, it'll stick around waiting on a compaction to happen.  While its hanging, the MemStoreFlusher#run is held up.  No other region can flush.  Meantime WALs will be rolling and memory will be accumulating writes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.