You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2009/06/09 18:17:07 UTC

[jira] Created: (HBASE-1506) Make splits faster by having the regionserver, on split, immediately start serving bottom half of the split

Make splits faster by having the regionserver, on split, immediately start serving bottom half of the split
-----------------------------------------------------------------------------------------------------------

                 Key: HBASE-1506
                 URL: https://issues.apache.org/jira/browse/HBASE-1506
             Project: Hadoop HBase
          Issue Type: Improvement
            Reporter: stack


Regionservers run splits.  They close the region to split, divide it, and then tell master about the two new regions.  Master then assigns new regions.  New regions need to come up in new locations.  Both regions are offline during this time.

Instead, regionserver might run split as it does now but new, deploy the lower-half on the current regionserver immediately.  It'd then inform master that it had split, and that it was serving the lower half.  Master would then take care of assigning the upper half.

Benefits would be that clients who were accessing the lower half of the split would not need to go through recalibration.  They'd just keep working.  There'd be disruption for those keys that landed in the top half of the split only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1506) [performance] Make splits faster

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1506:
-------------------------

    Attachment: split.patch

This patch implements two things:

1. If the regionserver has something report the master -- a close, a split, etc. -- it does not wait to send it (Splits are 4 messages, opens are two messages usually -- these are still sent in a batch but this patch means less baching at the benefit of faster reaction to cluster event).
2. The lower half of a split is assigned immediately to the local regionserver.  Only the top half is given to the master to assign.

Splits should run a little faster.  1. above takes away a small piece of the delay.  2. means that there is half the master/regionserver interaction getting daughter regions back on line.

Will be back with a few numbers.

> [performance] Make splits faster
> --------------------------------
>
>                 Key: HBASE-1506
>                 URL: https://issues.apache.org/jira/browse/HBASE-1506
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Priority: Critical
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: split.patch
>
>
> Regionservers run splits.  They close the region to split, divide it, and then tell master about the two new regions.  Master then assigns new regions.  New regions need to come up in new locations.  Both regions are offline during this time.
> Instead, regionserver might run split as it does now but new, deploy the lower-half on the current regionserver immediately.  It'd then inform master that it had split, and that it was serving the lower half.  Master would then take care of assigning the upper half.
> Benefits would be that clients who were accessing the lower half of the split would not need to go through recalibration.  They'd just keep working.  There'd be disruption for those keys that landed in the top half of the split only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1506) [performance] Make splits faster

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1506:
-------------------------

    Fix Version/s: 0.20.1

Bring into 0.20.1

> [performance] Make splits faster
> --------------------------------
>
>                 Key: HBASE-1506
>                 URL: https://issues.apache.org/jira/browse/HBASE-1506
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Priority: Critical
>             Fix For: 0.20.1, 0.21.0
>
>
> Regionservers run splits.  They close the region to split, divide it, and then tell master about the two new regions.  Master then assigns new regions.  New regions need to come up in new locations.  Both regions are offline during this time.
> Instead, regionserver might run split as it does now but new, deploy the lower-half on the current regionserver immediately.  It'd then inform master that it had split, and that it was serving the lower half.  Master would then take care of assigning the upper half.
> Benefits would be that clients who were accessing the lower half of the split would not need to go through recalibration.  They'd just keep working.  There'd be disruption for those keys that landed in the top half of the split only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1506) [performance] Make splits faster

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1506:
-------------------------

    Status: Patch Available  (was: Open)

> [performance] Make splits faster
> --------------------------------
>
>                 Key: HBASE-1506
>                 URL: https://issues.apache.org/jira/browse/HBASE-1506
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Priority: Critical
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: split.patch
>
>
> Regionservers run splits.  They close the region to split, divide it, and then tell master about the two new regions.  Master then assigns new regions.  New regions need to come up in new locations.  Both regions are offline during this time.
> Instead, regionserver might run split as it does now but new, deploy the lower-half on the current regionserver immediately.  It'd then inform master that it had split, and that it was serving the lower half.  Master would then take care of assigning the upper half.
> Benefits would be that clients who were accessing the lower half of the split would not need to go through recalibration.  They'd just keep working.  There'd be disruption for those keys that landed in the top half of the split only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1506) [performance] Make splits faster

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762435#action_12762435 ] 

stack commented on HBASE-1506:
------------------------------

I tested this on cluster.  Seems to work.  Going to commit.

> [performance] Make splits faster
> --------------------------------
>
>                 Key: HBASE-1506
>                 URL: https://issues.apache.org/jira/browse/HBASE-1506
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Priority: Critical
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: split.patch
>
>
> Regionservers run splits.  They close the region to split, divide it, and then tell master about the two new regions.  Master then assigns new regions.  New regions need to come up in new locations.  Both regions are offline during this time.
> Instead, regionserver might run split as it does now but new, deploy the lower-half on the current regionserver immediately.  It'd then inform master that it had split, and that it was serving the lower half.  Master would then take care of assigning the upper half.
> Benefits would be that clients who were accessing the lower half of the split would not need to go through recalibration.  They'd just keep working.  There'd be disruption for those keys that landed in the top half of the split only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1506) [performance] Make splits faster

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1506:
-------------------------

    Priority: Critical  (was: Major)
     Summary: [performance] Make splits faster  (was: Make splits faster)

> [performance] Make splits faster
> --------------------------------
>
>                 Key: HBASE-1506
>                 URL: https://issues.apache.org/jira/browse/HBASE-1506
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Priority: Critical
>             Fix For: 0.21.0
>
>
> Regionservers run splits.  They close the region to split, divide it, and then tell master about the two new regions.  Master then assigns new regions.  New regions need to come up in new locations.  Both regions are offline during this time.
> Instead, regionserver might run split as it does now but new, deploy the lower-half on the current regionserver immediately.  It'd then inform master that it had split, and that it was serving the lower half.  Master would then take care of assigning the upper half.
> Benefits would be that clients who were accessing the lower half of the split would not need to go through recalibration.  They'd just keep working.  There'd be disruption for those keys that landed in the top half of the split only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1506) Make splits faster

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1506:
-------------------------

    Summary: Make splits faster  (was: Make splits faster by having the regionserver, on split, immediately start serving bottom half of the split)

I changed the title to be just make splits faster.

Another idea is that if a regionserver has any messages for the master, send them immediately rather than wait for its heartbeat interval to expire.

> Make splits faster
> ------------------
>
>                 Key: HBASE-1506
>                 URL: https://issues.apache.org/jira/browse/HBASE-1506
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>             Fix For: 0.21.0
>
>
> Regionservers run splits.  They close the region to split, divide it, and then tell master about the two new regions.  Master then assigns new regions.  New regions need to come up in new locations.  Both regions are offline during this time.
> Instead, regionserver might run split as it does now but new, deploy the lower-half on the current regionserver immediately.  It'd then inform master that it had split, and that it was serving the lower half.  Master would then take care of assigning the upper half.
> Benefits would be that clients who were accessing the lower half of the split would not need to go through recalibration.  They'd just keep working.  There'd be disruption for those keys that landed in the top half of the split only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1506) [performance] Make splits faster

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1506:
-------------------------

    Resolution: Fixed
      Assignee: stack
        Status: Resolved  (was: Patch Available)

Committed branch and trunk.

Will open new issue to make splits even faster.  This is all we can do w/ current architecture.

> [performance] Make splits faster
> --------------------------------
>
>                 Key: HBASE-1506
>                 URL: https://issues.apache.org/jira/browse/HBASE-1506
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.20.1, 0.21.0
>
>         Attachments: split.patch
>
>
> Regionservers run splits.  They close the region to split, divide it, and then tell master about the two new regions.  Master then assigns new regions.  New regions need to come up in new locations.  Both regions are offline during this time.
> Instead, regionserver might run split as it does now but new, deploy the lower-half on the current regionserver immediately.  It'd then inform master that it had split, and that it was serving the lower half.  Master would then take care of assigning the upper half.
> Benefits would be that clients who were accessing the lower half of the split would not need to go through recalibration.  They'd just keep working.  There'd be disruption for those keys that landed in the top half of the split only.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.