You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Austin Heyne <ah...@ccri.com> on 2018/08/30 19:40:59 UTC
HMerge Status
We're currently sitting at a very high number of regions due to an
initially poor value for hbase.regionserver.regionSplitLimit and would
like to reign in our region count. Additionally, we have a
spatio-temporal key structure and our region pre-splitting was done
evenly, without regard to the spatial distribution of our data and thus
have a lot of small and empty regions we'd like to clean up. I've found
the HMerge class [1], and it seems it would do something reasonable for
our used case. However, it's marked as Private and doesn't seem to be
used anywhere so I thought I'd ask if anyone knows the status of this
class and how safe it is.
Thanks,
Austin
[1]
https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
Re: HMerge Status
Posted by Josh Elser <el...@apache.org>.
Hey JMS,
No, that's not my understanding. I'm not sure how the Normalizer would
change the size of the Regions but keep the number of Regions the same :)
IIRC, the Normalizer works by looking at adjacent Regions, merging them
together when their size is under the given threshold. The caveat is
that it does this relatively slowly to avoid causing duress on users
actively doing things on the system.
Having an active mode (do merges fast) and passive mode (do merges
slowly) sounds like a nice addition now that I think about it.
On 8/31/18 9:22 AM, Jean-Marc Spaggiari wrote:
> If I'm not mistaken the Normalizer will keep the same number of regions,
> but will uniform the size, right? So if the goal is to reduce the number of
> region, the Normalizer might not help?
>
> JMS
>
> Le ven. 31 août 2018 à 09:16, Josh Elser <el...@apache.org> a écrit :
>
>> There's the Region Normalizer which I'd presume would be in an HBase 1.4
>> release
>>
>> https://issues.apache.org/jira/browse/HBASE-13103
>>
>> On 8/30/18 3:50 PM, Austin Heyne wrote:
>>> I'm using HBase 1.4.4 (AWS/EMR) and I'm looking for an automated
>>> solution because I believe there are going to be a few hundred if not
>>> thousand merges. It's also challenging to find candidate pairs.
>>>
>>> -Austin
>>>
>>>
>>> On 08/30/2018 03:45 PM, Jean-Marc Spaggiari wrote:
>>>> Hi Austin,
>>>>
>>>> Which version are you using? Why not just using the shell merge command?
>>>>
>>>> JMS
>>>>
>>>> Le jeu. 30 août 2018 à 15:41, Austin Heyne <ah...@ccri.com> a écrit :
>>>>
>>>>> We're currently sitting at a very high number of regions due to an
>>>>> initially poor value for hbase.regionserver.regionSplitLimit and would
>>>>> like to reign in our region count. Additionally, we have a
>>>>> spatio-temporal key structure and our region pre-splitting was done
>>>>> evenly, without regard to the spatial distribution of our data and thus
>>>>> have a lot of small and empty regions we'd like to clean up. I've found
>>>>> the HMerge class [1], and it seems it would do something reasonable for
>>>>> our used case. However, it's marked as Private and doesn't seem to be
>>>>> used anywhere so I thought I'd ask if anyone knows the status of this
>>>>> class and how safe it is.
>>>>>
>>>>> Thanks,
>>>>> Austin
>>>>>
>>>>> [1]
>>>>>
>>>>>
>> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
>>>>>
>>>>>
>>>>>
>>>
>>
>
Re: HMerge Status
Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
If I'm not mistaken the Normalizer will keep the same number of regions,
but will uniform the size, right? So if the goal is to reduce the number of
region, the Normalizer might not help?
JMS
Le ven. 31 août 2018 à 09:16, Josh Elser <el...@apache.org> a écrit :
> There's the Region Normalizer which I'd presume would be in an HBase 1.4
> release
>
> https://issues.apache.org/jira/browse/HBASE-13103
>
> On 8/30/18 3:50 PM, Austin Heyne wrote:
> > I'm using HBase 1.4.4 (AWS/EMR) and I'm looking for an automated
> > solution because I believe there are going to be a few hundred if not
> > thousand merges. It's also challenging to find candidate pairs.
> >
> > -Austin
> >
> >
> > On 08/30/2018 03:45 PM, Jean-Marc Spaggiari wrote:
> >> Hi Austin,
> >>
> >> Which version are you using? Why not just using the shell merge command?
> >>
> >> JMS
> >>
> >> Le jeu. 30 août 2018 à 15:41, Austin Heyne <ah...@ccri.com> a écrit :
> >>
> >>> We're currently sitting at a very high number of regions due to an
> >>> initially poor value for hbase.regionserver.regionSplitLimit and would
> >>> like to reign in our region count. Additionally, we have a
> >>> spatio-temporal key structure and our region pre-splitting was done
> >>> evenly, without regard to the spatial distribution of our data and thus
> >>> have a lot of small and empty regions we'd like to clean up. I've found
> >>> the HMerge class [1], and it seems it would do something reasonable for
> >>> our used case. However, it's marked as Private and doesn't seem to be
> >>> used anywhere so I thought I'd ask if anyone knows the status of this
> >>> class and how safe it is.
> >>>
> >>> Thanks,
> >>> Austin
> >>>
> >>> [1]
> >>>
> >>>
> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
> >>>
> >>>
> >>>
> >
>
Re: HMerge Status
Posted by Josh Elser <el...@apache.org>.
There's the Region Normalizer which I'd presume would be in an HBase 1.4
release
https://issues.apache.org/jira/browse/HBASE-13103
On 8/30/18 3:50 PM, Austin Heyne wrote:
> I'm using HBase 1.4.4 (AWS/EMR) and I'm looking for an automated
> solution because I believe there are going to be a few hundred if not
> thousand merges. It's also challenging to find candidate pairs.
>
> -Austin
>
>
> On 08/30/2018 03:45 PM, Jean-Marc Spaggiari wrote:
>> Hi Austin,
>>
>> Which version are you using? Why not just using the shell merge command?
>>
>> JMS
>>
>> Le jeu. 30 août 2018 à 15:41, Austin Heyne <ah...@ccri.com> a écrit :
>>
>>> We're currently sitting at a very high number of regions due to an
>>> initially poor value for hbase.regionserver.regionSplitLimit and would
>>> like to reign in our region count. Additionally, we have a
>>> spatio-temporal key structure and our region pre-splitting was done
>>> evenly, without regard to the spatial distribution of our data and thus
>>> have a lot of small and empty regions we'd like to clean up. I've found
>>> the HMerge class [1], and it seems it would do something reasonable for
>>> our used case. However, it's marked as Private and doesn't seem to be
>>> used anywhere so I thought I'd ask if anyone knows the status of this
>>> class and how safe it is.
>>>
>>> Thanks,
>>> Austin
>>>
>>> [1]
>>>
>>> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
>>>
>>>
>>>
>
Re: HMerge Status
Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Well, I did a script some time ago... wget the page.jsp, get the region
encoded name, and call the shell to merge them 2 by 2. Job done. will take
10 minutes to write and doesn't require to build or deploy anything.
Other option is like you said. Use the Java API to query the Admin and get
all table regions and then use the admin.merge method...
JMS
Le jeu. 30 août 2018 à 15:50, Austin Heyne <ah...@ccri.com> a écrit :
> I'm using HBase 1.4.4 (AWS/EMR) and I'm looking for an automated
> solution because I believe there are going to be a few hundred if not
> thousand merges. It's also challenging to find candidate pairs.
>
> -Austin
>
>
> On 08/30/2018 03:45 PM, Jean-Marc Spaggiari wrote:
> > Hi Austin,
> >
> > Which version are you using? Why not just using the shell merge command?
> >
> > JMS
> >
> > Le jeu. 30 août 2018 à 15:41, Austin Heyne <ah...@ccri.com> a écrit :
> >
> >> We're currently sitting at a very high number of regions due to an
> >> initially poor value for hbase.regionserver.regionSplitLimit and would
> >> like to reign in our region count. Additionally, we have a
> >> spatio-temporal key structure and our region pre-splitting was done
> >> evenly, without regard to the spatial distribution of our data and thus
> >> have a lot of small and empty regions we'd like to clean up. I've found
> >> the HMerge class [1], and it seems it would do something reasonable for
> >> our used case. However, it's marked as Private and doesn't seem to be
> >> used anywhere so I thought I'd ask if anyone knows the status of this
> >> class and how safe it is.
> >>
> >> Thanks,
> >> Austin
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
> >>
> >>
>
> --
> Austin L. Heyne
>
>
Re: HMerge Status
Posted by Andrew MacKay <An...@superna.net>.
we had this issue.
We adjusted tables with split policy to allow for large region in terms of
file size. Then altered table with optimizer flag then enabled optimizer
and waited. Eventually it will merge small regions into larger ones .
Fully automated.
It takes time.
On Thu, Aug 30, 2018 at 3:50 PM Austin Heyne <ah...@ccri.com> wrote:
> I'm using HBase 1.4.4 (AWS/EMR) and I'm looking for an automated
> solution because I believe there are going to be a few hundred if not
> thousand merges. It's also challenging to find candidate pairs.
>
> -Austin
>
>
> On 08/30/2018 03:45 PM, Jean-Marc Spaggiari wrote:
> > Hi Austin,
> >
> > Which version are you using? Why not just using the shell merge command?
> >
> > JMS
> >
> > Le jeu. 30 août 2018 à 15:41, Austin Heyne <ah...@ccri.com> a écrit :
> >
> >> We're currently sitting at a very high number of regions due to an
> >> initially poor value for hbase.regionserver.regionSplitLimit and would
> >> like to reign in our region count. Additionally, we have a
> >> spatio-temporal key structure and our region pre-splitting was done
> >> evenly, without regard to the spatial distribution of our data and thus
> >> have a lot of small and empty regions we'd like to clean up. I've found
> >> the HMerge class [1], and it seems it would do something reasonable for
> >> our used case. However, it's marked as Private and doesn't seem to be
> >> used anywhere so I thought I'd ask if anyone knows the status of this
> >> class and how safe it is.
> >>
> >> Thanks,
> >> Austin
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
> >>
> >>
>
> --
> Austin L. Heyne
>
>
--
--
CONFIDENTIALITY NOTICE: The information contained in this email is
privileged and confidential and intended only for the use of the individual
or entity to whom it is addressed. If you receive this message in error,
please notify the sender immediately at 613-729-1100 and destroy the
original message and all copies. Thank you.
Re: HMerge Status
Posted by Austin Heyne <ah...@ccri.com>.
I'm using HBase 1.4.4 (AWS/EMR) and I'm looking for an automated
solution because I believe there are going to be a few hundred if not
thousand merges. It's also challenging to find candidate pairs.
-Austin
On 08/30/2018 03:45 PM, Jean-Marc Spaggiari wrote:
> Hi Austin,
>
> Which version are you using? Why not just using the shell merge command?
>
> JMS
>
> Le jeu. 30 août 2018 à 15:41, Austin Heyne <ah...@ccri.com> a écrit :
>
>> We're currently sitting at a very high number of regions due to an
>> initially poor value for hbase.regionserver.regionSplitLimit and would
>> like to reign in our region count. Additionally, we have a
>> spatio-temporal key structure and our region pre-splitting was done
>> evenly, without regard to the spatial distribution of our data and thus
>> have a lot of small and empty regions we'd like to clean up. I've found
>> the HMerge class [1], and it seems it would do something reasonable for
>> our used case. However, it's marked as Private and doesn't seem to be
>> used anywhere so I thought I'd ask if anyone knows the status of this
>> class and how safe it is.
>>
>> Thanks,
>> Austin
>>
>> [1]
>>
>> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
>>
>>
--
Austin L. Heyne
Re: HMerge Status
Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Austin,
Which version are you using? Why not just using the shell merge command?
JMS
Le jeu. 30 août 2018 à 15:41, Austin Heyne <ah...@ccri.com> a écrit :
> We're currently sitting at a very high number of regions due to an
> initially poor value for hbase.regionserver.regionSplitLimit and would
> like to reign in our region count. Additionally, we have a
> spatio-temporal key structure and our region pre-splitting was done
> evenly, without regard to the spatial distribution of our data and thus
> have a lot of small and empty regions we'd like to clean up. I've found
> the HMerge class [1], and it seems it would do something reasonable for
> our used case. However, it's marked as Private and doesn't seem to be
> used anywhere so I thought I'd ask if anyone knows the status of this
> class and how safe it is.
>
> Thanks,
> Austin
>
> [1]
>
> https://github.com/apache/hbase/blob/branch-1.4/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HMerge.java
>
>