You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Charles Connell (Jira)" <ji...@apache.org> on 2022/11/18 18:30:00 UTC

[jira] [Updated] (HBASE-27496) Limit size of plans produced by SimpleRegionNormalizer

     [ https://issues.apache.org/jira/browse/HBASE-27496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Charles Connell updated HBASE-27496:
------------------------------------
    Description: 
My company (Hubspot) is starting to use {{{}SimpleRegionNormalizer{}}}. We turn the normalizer switch on for 30 minutes each day, when our database traffic is at a low point. We're using the {{hbase.normalizer.throughput.max_bytes_per_sec}} setting to create a rate limit. I've found that while the {{SimpleRegionNormalizer}} only produces new plans for 30 minutes each day, the plans often take many hours to execute. This leds to region splits, merges, and moves occurring in our HBase clusters during hours we'd prefer them not to.

I propose two new settings:
 * {{hbase.normalizer.merge.plans_size_limit.mb}}
 * {{hbase.normalizer.split.plans_size_limit.mb}}

This will allow HBase administrators to limit the number of plans produced by a run of {{{}SimpleRegionNormalizer{}}}, by forcing it to stop producing new plans once the cumulative region size limits are exceeded. This will give you a way to limit approximately how long it takes to execute the plans. Because the current limit to execute plans is primarily determined by a per-byte rate limit, I propose that the new settings also work on a similar basis. This will make it feasible to reason about how your rate limit and your size limits interact.

  was:
My company (Hubspot) is starting to use {{{}SimpleRegionNormalizer{}}}. We turn the normalizer switch on for 30 minutes each day, when our database traffic is at a low point. We're using the {{hbase.normalizer.throughput.max_bytes_per_sec}} setting to create a rate limit. I've found that while the {{SimpleRegionNormalizer}} only produces new plans for 30 minutes each day, the plans often take many hours to execute. This leds to region splits, merges, and moves occurring in our HBase clusters during hours we'd prefer them not to.{color:#067d17}
{color}

I propose two new settings:
 * {{hbase.normalizer.merge.plans_size_limit.mb}}
 * {{hbase.normalizer.split.plans_size_limit.mb}}

This will allow HBase administrators to limit the number of plans produced by a run of {{{}SimpleRegionNormalizer{}}}. This will give you a way to limit approximately how long it takes to execute the plans. Because the current limit to execute plans is primarily determined by a per-byte rate limit, I propose that the new settings also work on a similar basis. This will make it feasible to reason about how your rate limit and your size limits interact.


> Limit size of plans produced by SimpleRegionNormalizer
> ------------------------------------------------------
>
>                 Key: HBASE-27496
>                 URL: https://issues.apache.org/jira/browse/HBASE-27496
>             Project: HBase
>          Issue Type: Improvement
>          Components: Normalizer
>            Reporter: Charles Connell
>            Priority: Minor
>
> My company (Hubspot) is starting to use {{{}SimpleRegionNormalizer{}}}. We turn the normalizer switch on for 30 minutes each day, when our database traffic is at a low point. We're using the {{hbase.normalizer.throughput.max_bytes_per_sec}} setting to create a rate limit. I've found that while the {{SimpleRegionNormalizer}} only produces new plans for 30 minutes each day, the plans often take many hours to execute. This leds to region splits, merges, and moves occurring in our HBase clusters during hours we'd prefer them not to.
> I propose two new settings:
>  * {{hbase.normalizer.merge.plans_size_limit.mb}}
>  * {{hbase.normalizer.split.plans_size_limit.mb}}
> This will allow HBase administrators to limit the number of plans produced by a run of {{{}SimpleRegionNormalizer{}}}, by forcing it to stop producing new plans once the cumulative region size limits are exceeded. This will give you a way to limit approximately how long it takes to execute the plans. Because the current limit to execute plans is primarily determined by a per-byte rate limit, I propose that the new settings also work on a similar basis. This will make it feasible to reason about how your rate limit and your size limits interact.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)