You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "John Heitmann (JIRA)" <ji...@apache.org> on 2011/07/11 21:05:59 UTC

[jira] [Created] (HBASE-4084) Auto-Split runs only if there are many store files per region

Auto-Split runs only if there are many store files per region
-------------------------------------------------------------

                 Key: HBASE-4084
                 URL: https://issues.apache.org/jira/browse/HBASE-4084
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.94.0
            Reporter: John Heitmann


Currently, MemStoreFlusher.flushRegion() is the driver of auto-splitting. It only decides to auto-split a region if there are too many store files per region. Since it's not guaranteed that the number of store files per region always grows above the "too many" count before compaction reduces the count, there is no guarantee that auto-split will ever happen. In my test setup, compaction seems to always win the race and I haven't noticed auto-splitting happen once.

It appears that the intention is to have split be mutually exclusive with compaction, and to have flushing be mutually exclusive with regions badly in need of compaction, but that resulted in auto-splitting being nested in a too-restrictive spot.

I'm not sure what the right fix is. Having one method that is essentially requestSplitOrCompact would probably help readability, and could be the ultimate solution if it replaces other calls of requestCompaction().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4084) Auto-Split runs only if there are many store files per region

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063476#comment-13063476 ] 

Jonathan Gray commented on HBASE-4084:
--------------------------------------

I thought splits were triggered following a compaction not a flush?

> Auto-Split runs only if there are many store files per region
> -------------------------------------------------------------
>
>                 Key: HBASE-4084
>                 URL: https://issues.apache.org/jira/browse/HBASE-4084
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: John Heitmann
>
> Currently, MemStoreFlusher.flushRegion() is the driver of auto-splitting. It only decides to auto-split a region if there are too many store files per region. Since it's not guaranteed that the number of store files per region always grows above the "too many" count before compaction reduces the count, there is no guarantee that auto-split will ever happen. In my test setup, compaction seems to always win the race and I haven't noticed auto-splitting happen once.
> It appears that the intention is to have split be mutually exclusive with compaction, and to have flushing be mutually exclusive with regions badly in need of compaction, but that resulted in auto-splitting being nested in a too-restrictive spot.
> I'm not sure what the right fix is. Having one method that is essentially requestSplitOrCompact would probably help readability, and could be the ultimate solution if it replaces other calls of requestCompaction().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4084) Auto-Split runs only if there are many store files per region

Posted by "John Heitmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063486#comment-13063486 ] 

John Heitmann commented on HBASE-4084:
--------------------------------------

Here is a shortened skeleton of flush leading to a split (from MemStoreFlusher):


  private boolean flushRegion(final FlushRegionEntry fqe) {
    if (!isMetaRegion(region) && isTooManyStoreFiles(region)) {
      if (... waited too long for compaction to clean things up? ...) {
        log("Waited too long for compaction to clean things up")
      } else {
        if (!this.server.compactSplitThread.requestSplit(region)) {
          this.server.compactSplitThread.requestCompaction(region, getName());
        }
        // Put back on the queue.  Have it come back out of the queue
        // after a delay of this.blockingWaitTime / 100 ms.
        this.flushQueue.add(fqe.requeue(this.blockingWaitTime / 100));
        // Tell a lie, it's not flushed but it's ok
        return true;
      }
    }
    return flushRegion(region, false);
  }


I don't see any other place split is called other than manual splitting, but I could easily be missing something. I've been tracing this through by finding callers of CompactSplitThread.requestSplit().

> Auto-Split runs only if there are many store files per region
> -------------------------------------------------------------
>
>                 Key: HBASE-4084
>                 URL: https://issues.apache.org/jira/browse/HBASE-4084
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: John Heitmann
>
> Currently, MemStoreFlusher.flushRegion() is the driver of auto-splitting. It only decides to auto-split a region if there are too many store files per region. Since it's not guaranteed that the number of store files per region always grows above the "too many" count before compaction reduces the count, there is no guarantee that auto-split will ever happen. In my test setup, compaction seems to always win the race and I haven't noticed auto-splitting happen once.
> It appears that the intention is to have split be mutually exclusive with compaction, and to have flushing be mutually exclusive with regions badly in need of compaction, but that resulted in auto-splitting being nested in a too-restrictive spot.
> I'm not sure what the right fix is. Having one method that is essentially requestSplitOrCompact would probably help readability, and could be the ultimate solution if it replaces other calls of requestCompaction().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4084) Auto-Split runs only if there are many store files per region

Posted by "John Heitmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063490#comment-13063490 ] 

John Heitmann commented on HBASE-4084:
--------------------------------------

Let's try that again with formatting:

{code}
private boolean flushRegion(final FlushRegionEntry fqe) {
  if (!isMetaRegion(region) && isTooManyStoreFiles(region)) {
    if (... waited too long for compaction to clean things up? ...) {
      log("Waited too long for compaction to clean things up")
    } else {
      if (!this.server.compactSplitThread.requestSplit(region)) {
        this.server.compactSplitThread.requestCompaction(region, getName());
      }
      // Put back on the queue. Have it come back out of the queue
      // after a delay of this.blockingWaitTime / 100 ms.
      this.flushQueue.add(fqe.requeue(this.blockingWaitTime / 100));
      // Tell a lie, it's not flushed but it's ok
      return true;
    }
  }
  return flushRegion(region, false);
}
{code}

> Auto-Split runs only if there are many store files per region
> -------------------------------------------------------------
>
>                 Key: HBASE-4084
>                 URL: https://issues.apache.org/jira/browse/HBASE-4084
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: John Heitmann
>
> Currently, MemStoreFlusher.flushRegion() is the driver of auto-splitting. It only decides to auto-split a region if there are too many store files per region. Since it's not guaranteed that the number of store files per region always grows above the "too many" count before compaction reduces the count, there is no guarantee that auto-split will ever happen. In my test setup, compaction seems to always win the race and I haven't noticed auto-splitting happen once.
> It appears that the intention is to have split be mutually exclusive with compaction, and to have flushing be mutually exclusive with regions badly in need of compaction, but that resulted in auto-splitting being nested in a too-restrictive spot.
> I'm not sure what the right fix is. Having one method that is essentially requestSplitOrCompact would probably help readability, and could be the ultimate solution if it replaces other calls of requestCompaction().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4084) Auto-Split runs only if there are many store files per region

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063492#comment-13063492 ] 

Ted Yu commented on HBASE-4084:
-------------------------------

I think HBASE-4081 is related to this ticket.
There was suggestion of removing the call to s.CheckSplit() in HBASE-4081.


> Auto-Split runs only if there are many store files per region
> -------------------------------------------------------------
>
>                 Key: HBASE-4084
>                 URL: https://issues.apache.org/jira/browse/HBASE-4084
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: John Heitmann
>
> Currently, MemStoreFlusher.flushRegion() is the driver of auto-splitting. It only decides to auto-split a region if there are too many store files per region. Since it's not guaranteed that the number of store files per region always grows above the "too many" count before compaction reduces the count, there is no guarantee that auto-split will ever happen. In my test setup, compaction seems to always win the race and I haven't noticed auto-splitting happen once.
> It appears that the intention is to have split be mutually exclusive with compaction, and to have flushing be mutually exclusive with regions badly in need of compaction, but that resulted in auto-splitting being nested in a too-restrictive spot.
> I'm not sure what the right fix is. Having one method that is essentially requestSplitOrCompact would probably help readability, and could be the ultimate solution if it replaces other calls of requestCompaction().

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-4084) Auto-Split runs only if there are many store files per region

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl resolved HBASE-4084.
----------------------------------

    Resolution: Fixed

Looked at the current code. A split request is now created in both the Flush and the Compaction cases.
                
> Auto-Split runs only if there are many store files per region
> -------------------------------------------------------------
>
>                 Key: HBASE-4084
>                 URL: https://issues.apache.org/jira/browse/HBASE-4084
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: John Heitmann
>
> Currently, MemStoreFlusher.flushRegion() is the driver of auto-splitting. It only decides to auto-split a region if there are too many store files per region. Since it's not guaranteed that the number of store files per region always grows above the "too many" count before compaction reduces the count, there is no guarantee that auto-split will ever happen. In my test setup, compaction seems to always win the race and I haven't noticed auto-splitting happen once.
> It appears that the intention is to have split be mutually exclusive with compaction, and to have flushing be mutually exclusive with regions badly in need of compaction, but that resulted in auto-splitting being nested in a too-restrictive spot.
> I'm not sure what the right fix is. Having one method that is essentially requestSplitOrCompact would probably help readability, and could be the ultimate solution if it replaces other calls of requestCompaction().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira