You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Austin Heyne <ah...@ccri.com> on 2018/08/30 19:40:59 UTC

HMerge Status

We're currently sitting at a very high number of regions due to an 
initially poor value for hbase.regionserver.regionSplitLimit and would 
like to reign in our region count. Additionally, we have a 
spatio-temporal key structure and our region pre-splitting was done 
evenly, without regard to the spatial distribution of our data and thus 
have a lot of small and empty regions we'd like to clean up. I've found 
the HMerge class [1], and it seems it would do something reasonable for 
our used case. However, it's marked as Private and doesn't seem to be 
used anywhere so I thought I'd ask if anyone knows the status of this 
class and how safe it is.

Thanks,
Austin

[1] 
https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java


Re: HMerge Status

Posted by Josh Elser <el...@apache.org>.
Hey JMS,

No, that's not my understanding. I'm not sure how the Normalizer would 
change the size of the Regions but keep the number of Regions the same :)

IIRC, the Normalizer works by looking at adjacent Regions, merging them 
together when their size is under the given threshold. The caveat is 
that it does this relatively slowly to avoid causing duress on users 
actively doing things on the system.

Having an active mode (do merges fast) and passive mode (do merges 
slowly) sounds like a nice addition now that I think about it.

On 8/31/18 9:22 AM, Jean-Marc Spaggiari wrote:
> If I'm not mistaken the Normalizer will keep the same number of regions,
> but will uniform the size, right? So if the goal is to reduce the number of
> region, the Normalizer might not help?
> 
> JMS
> 
> Le ven. 31 août 2018 à 09:16, Josh Elser <el...@apache.org> a écrit :
> 
>> There's the Region Normalizer which I'd presume would be in an HBase 1.4
>> release
>>
>> https://issues.apache.org/jira/browse/HBASE-13103
>>
>> On 8/30/18 3:50 PM, Austin Heyne wrote:
>>> I'm using HBase 1.4.4 (AWS/EMR) and I'm looking for an automated
>>> solution because I believe there are going to be a few hundred if not
>>> thousand merges. It's also challenging to find candidate pairs.
>>>
>>> -Austin
>>>
>>>
>>> On 08/30/2018 03:45 PM, Jean-Marc Spaggiari wrote:
>>>> Hi Austin,
>>>>
>>>> Which version are you using? Why not just using the shell merge command?
>>>>
>>>> JMS
>>>>
>>>> Le jeu. 30 août 2018 à 15:41, Austin Heyne <ah...@ccri.com> a écrit :
>>>>
>>>>> We're currently sitting at a very high number of regions due to an
>>>>> initially poor value for hbase.regionserver.regionSplitLimit and would
>>>>> like to reign in our region count. Additionally, we have a
>>>>> spatio-temporal key structure and our region pre-splitting was done
>>>>> evenly, without regard to the spatial distribution of our data and thus
>>>>> have a lot of small and empty regions we'd like to clean up. I've found
>>>>> the HMerge class [1], and it seems it would do something reasonable for
>>>>> our used case. However, it's marked as Private and doesn't seem to be
>>>>> used anywhere so I thought I'd ask if anyone knows the status of this
>>>>> class and how safe it is.
>>>>>
>>>>> Thanks,
>>>>> Austin
>>>>>
>>>>> [1]
>>>>>
>>>>>
>> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
>>>>>
>>>>>
>>>>>
>>>
>>
> 

Re: HMerge Status

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
If I'm not mistaken the Normalizer will keep the same number of regions,
but will uniform the size, right? So if the goal is to reduce the number of
region, the Normalizer might not help?

JMS

Le ven. 31 août 2018 à 09:16, Josh Elser <el...@apache.org> a écrit :

> There's the Region Normalizer which I'd presume would be in an HBase 1.4
> release
>
> https://issues.apache.org/jira/browse/HBASE-13103
>
> On 8/30/18 3:50 PM, Austin Heyne wrote:
> > I'm using HBase 1.4.4 (AWS/EMR) and I'm looking for an automated
> > solution because I believe there are going to be a few hundred if not
> > thousand merges. It's also challenging to find candidate pairs.
> >
> > -Austin
> >
> >
> > On 08/30/2018 03:45 PM, Jean-Marc Spaggiari wrote:
> >> Hi Austin,
> >>
> >> Which version are you using? Why not just using the shell merge command?
> >>
> >> JMS
> >>
> >> Le jeu. 30 août 2018 à 15:41, Austin Heyne <ah...@ccri.com> a écrit :
> >>
> >>> We're currently sitting at a very high number of regions due to an
> >>> initially poor value for hbase.regionserver.regionSplitLimit and would
> >>> like to reign in our region count. Additionally, we have a
> >>> spatio-temporal key structure and our region pre-splitting was done
> >>> evenly, without regard to the spatial distribution of our data and thus
> >>> have a lot of small and empty regions we'd like to clean up. I've found
> >>> the HMerge class [1], and it seems it would do something reasonable for
> >>> our used case. However, it's marked as Private and doesn't seem to be
> >>> used anywhere so I thought I'd ask if anyone knows the status of this
> >>> class and how safe it is.
> >>>
> >>> Thanks,
> >>> Austin
> >>>
> >>> [1]
> >>>
> >>>
> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
> >>>
> >>>
> >>>
> >
>

Re: HMerge Status

Posted by Josh Elser <el...@apache.org>.
There's the Region Normalizer which I'd presume would be in an HBase 1.4 
release

https://issues.apache.org/jira/browse/HBASE-13103

On 8/30/18 3:50 PM, Austin Heyne wrote:
> I'm using HBase 1.4.4 (AWS/EMR) and I'm looking for an automated 
> solution because I believe there are going to be a few hundred if not 
> thousand merges. It's also challenging to find candidate pairs.
> 
> -Austin
> 
> 
> On 08/30/2018 03:45 PM, Jean-Marc Spaggiari wrote:
>> Hi Austin,
>>
>> Which version are you using? Why not just using the shell merge command?
>>
>> JMS
>>
>> Le jeu. 30 août 2018 à 15:41, Austin Heyne <ah...@ccri.com> a écrit :
>>
>>> We're currently sitting at a very high number of regions due to an
>>> initially poor value for hbase.regionserver.regionSplitLimit and would
>>> like to reign in our region count. Additionally, we have a
>>> spatio-temporal key structure and our region pre-splitting was done
>>> evenly, without regard to the spatial distribution of our data and thus
>>> have a lot of small and empty regions we'd like to clean up. I've found
>>> the HMerge class [1], and it seems it would do something reasonable for
>>> our used case. However, it's marked as Private and doesn't seem to be
>>> used anywhere so I thought I'd ask if anyone knows the status of this
>>> class and how safe it is.
>>>
>>> Thanks,
>>> Austin
>>>
>>> [1]
>>>
>>> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java 
>>>
>>>
>>>
> 

Re: HMerge Status

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Well, I did a script some time ago... wget the page.jsp, get the region
encoded name, and call the shell to merge them 2 by 2. Job done. will take
10 minutes to write and doesn't require to build or deploy anything.

Other option is like you said. Use the Java API to query the Admin and get
all table regions and then use the admin.merge method...

JMS

Le jeu. 30 août 2018 à 15:50, Austin Heyne <ah...@ccri.com> a écrit :

> I'm using HBase 1.4.4 (AWS/EMR) and I'm looking for an automated
> solution because I believe there are going to be a few hundred if not
> thousand merges. It's also challenging to find candidate pairs.
>
> -Austin
>
>
> On 08/30/2018 03:45 PM, Jean-Marc Spaggiari wrote:
> > Hi Austin,
> >
> > Which version are you using? Why not just using the shell merge command?
> >
> > JMS
> >
> > Le jeu. 30 août 2018 à 15:41, Austin Heyne <ah...@ccri.com> a écrit :
> >
> >> We're currently sitting at a very high number of regions due to an
> >> initially poor value for hbase.regionserver.regionSplitLimit and would
> >> like to reign in our region count. Additionally, we have a
> >> spatio-temporal key structure and our region pre-splitting was done
> >> evenly, without regard to the spatial distribution of our data and thus
> >> have a lot of small and empty regions we'd like to clean up. I've found
> >> the HMerge class [1], and it seems it would do something reasonable for
> >> our used case. However, it's marked as Private and doesn't seem to be
> >> used anywhere so I thought I'd ask if anyone knows the status of this
> >> class and how safe it is.
> >>
> >> Thanks,
> >> Austin
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
> >>
> >>
>
> --
> Austin L. Heyne
>
>

Re: HMerge Status

Posted by Andrew MacKay <An...@superna.net>.
we had this issue.

We adjusted tables with split policy to allow for large region in terms of
file size.  Then altered table with optimizer flag  then enabled optimizer
and waited.  Eventually it will merge small regions into larger ones .
Fully automated.

It takes time.

On Thu, Aug 30, 2018 at 3:50 PM Austin Heyne <ah...@ccri.com> wrote:

> I'm using HBase 1.4.4 (AWS/EMR) and I'm looking for an automated
> solution because I believe there are going to be a few hundred if not
> thousand merges. It's also challenging to find candidate pairs.
>
> -Austin
>
>
> On 08/30/2018 03:45 PM, Jean-Marc Spaggiari wrote:
> > Hi Austin,
> >
> > Which version are you using? Why not just using the shell merge command?
> >
> > JMS
> >
> > Le jeu. 30 août 2018 à 15:41, Austin Heyne <ah...@ccri.com> a écrit :
> >
> >> We're currently sitting at a very high number of regions due to an
> >> initially poor value for hbase.regionserver.regionSplitLimit and would
> >> like to reign in our region count. Additionally, we have a
> >> spatio-temporal key structure and our region pre-splitting was done
> >> evenly, without regard to the spatial distribution of our data and thus
> >> have a lot of small and empty regions we'd like to clean up. I've found
> >> the HMerge class [1], and it seems it would do something reasonable for
> >> our used case. However, it's marked as Private and doesn't seem to be
> >> used anywhere so I thought I'd ask if anyone knows the status of this
> >> class and how safe it is.
> >>
> >> Thanks,
> >> Austin
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
> >>
> >>
>
> --
> Austin L. Heyne
>
>

--

-- 
CONFIDENTIALITY NOTICE: The information contained in this email is 
privileged and confidential and intended only for the use of the individual 
or entity to whom it is addressed.   If you receive this message in error, 
please notify the sender immediately at 613-729-1100 and destroy the 
original message and all copies. Thank you.

Re: HMerge Status

Posted by Austin Heyne <ah...@ccri.com>.
I'm using HBase 1.4.4 (AWS/EMR) and I'm looking for an automated 
solution because I believe there are going to be a few hundred if not 
thousand merges. It's also challenging to find candidate pairs.

-Austin


On 08/30/2018 03:45 PM, Jean-Marc Spaggiari wrote:
> Hi Austin,
>
> Which version are you using? Why not just using the shell merge command?
>
> JMS
>
> Le jeu. 30 août 2018 à 15:41, Austin Heyne <ah...@ccri.com> a écrit :
>
>> We're currently sitting at a very high number of regions due to an
>> initially poor value for hbase.regionserver.regionSplitLimit and would
>> like to reign in our region count. Additionally, we have a
>> spatio-temporal key structure and our region pre-splitting was done
>> evenly, without regard to the spatial distribution of our data and thus
>> have a lot of small and empty regions we'd like to clean up. I've found
>> the HMerge class [1], and it seems it would do something reasonable for
>> our used case. However, it's marked as Private and doesn't seem to be
>> used anywhere so I thought I'd ask if anyone knows the status of this
>> class and how safe it is.
>>
>> Thanks,
>> Austin
>>
>> [1]
>>
>> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
>>
>>

-- 
Austin L. Heyne


Re: HMerge Status

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Austin,

Which version are you using? Why not just using the shell merge command?

JMS

Le jeu. 30 août 2018 à 15:41, Austin Heyne <ah...@ccri.com> a écrit :

> We're currently sitting at a very high number of regions due to an
> initially poor value for hbase.regionserver.regionSplitLimit and would
> like to reign in our region count. Additionally, we have a
> spatio-temporal key structure and our region pre-splitting was done
> evenly, without regard to the spatial distribution of our data and thus
> have a lot of small and empty regions we'd like to clean up. I've found
> the HMerge class [1], and it seems it would do something reasonable for
> our used case. However, it's marked as Private and doesn't seem to be
> used anywhere so I thought I'd ask if anyone knows the status of this
> class and how safe it is.
>
> Thanks,
> Austin
>
> [1]
>
> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
>
>