You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ionut Ignatescu <io...@gmail.com> on 2012/07/17 18:08:11 UTC

How to merge regions in HBase?

My usecase: I have several tabels with key starting with a timestamp. Also,
this tabels have set data retention to 30 days.
Table size is around 1Tb(3Tb replicated) and data is inserted regular(on
5minute, ~200Mb is inserted).
File size is set to 1Gb. I have this tables in use for almost half an year
and now a table has around 6k partitions and 40% of them are empty.
The problem: the number of regions per region server is now pretty high.
Questions: 
Which approach is better?
- to merge adiacent empty partitions in a bigger one?
- to merge empty partitions to non-empty partitions?
Also, I'm wondering why regions merge is not part of major compactions and
why it's neccesary to stop the 
entire fleet to solve this problem.

 

Regards, 

Ionut I.




Re: How to merge regions in HBase?

Posted by Bryan Beaudreault <bb...@hubspot.com>.
Shouldn't it be possible for him to have empty regions if he has a TTL on his data? 

-- 
Bryan Beaudreault


On Wednesday, July 18, 2012 at 9:58 AM, Kevin O'dell wrote:

> Also, depending on your version of HBase that you are running you may have
> to bring down the cluster to merge and not just the table:
> 
> https://issues.apache.org/jira/browse/HBASE-1621
> 
> On Tue, Jul 17, 2012 at 7:26 PM, Amandeep Khurana <amansk@gmail.com (mailto:amansk@gmail.com)> wrote:
> 
> > You shouldn't have empty regions. Using timestamp will give you
> > regions that are always half filled except the last one to which you
> > are writing the current time range. The moment that'll fill up, split
> > and you'll again be writing to the last region. How did you end up
> > with empty regions? Did you pre-split?
> > 
> > On Jul 17, 2012, at 7:15 PM, Michael Segel <michael_segel@hotmail.com (mailto:michael_segel@hotmail.com)>
> > wrote:
> > 
> > > Find a different row key?
> > > 
> > > The problem with merging regions is that once you merge the regions, any
> > net new regions will still have the same problem. So you'll have to merge
> > again, and again and again.
> > > You're always filling to the left of the last key.
> > > 
> > > In order to merge, you have to take the table offline. At least that's
> > my understanding. So its not a good thing.
> > > 
> > > 
> > > On Jul 17, 2012, at 11:08 AM, Ionut Ignatescu wrote:
> > > 
> > > > My usecase: I have several tabels with key starting with a timestamp.
> > Also,
> > > > this tabels have set data retention to 30 days.
> > > > Table size is around 1Tb(3Tb replicated) and data is inserted regular(on
> > > > 5minute, ~200Mb is inserted).
> > > > File size is set to 1Gb. I have this tables in use for almost half an
> > > > 
> > > 
> > 
> > year
> > > > and now a table has around 6k partitions and 40% of them are empty.
> > > > The problem: the number of regions per region server is now pretty high.
> > > > Questions:
> > > > Which approach is better?
> > > > - to merge adiacent empty partitions in a bigger one?
> > > > - to merge empty partitions to non-empty partitions?
> > > > Also, I'm wondering why regions merge is not part of major compactions
> > > > 
> > > 
> > 
> > and
> > > > why it's neccesary to stop the
> > > > entire fleet to solve this problem.
> > > > 
> > > > 
> > > > 
> > > > Regards,
> > > > 
> > > > Ionut I.
> 
> 
> 
> -- 
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
> 
> 



Re: How to merge regions in HBase?

Posted by Kevin O'dell <ke...@cloudera.com>.
Also, depending on your version of HBase that you are running you may have
to bring down the cluster to merge and not just the table:

https://issues.apache.org/jira/browse/HBASE-1621

On Tue, Jul 17, 2012 at 7:26 PM, Amandeep Khurana <am...@gmail.com> wrote:

> You shouldn't have empty regions. Using timestamp will give you
> regions that are always half filled except the last one to which you
> are writing the current time range. The moment that'll fill up, split
> and you'll again be writing to the last region. How did you end up
> with empty regions? Did you pre-split?
>
> On Jul 17, 2012, at 7:15 PM, Michael Segel <mi...@hotmail.com>
> wrote:
>
> > Find a different row key?
> >
> > The problem with merging regions is that once you merge the regions, any
> net new regions will still have the same problem. So you'll have to merge
> again, and again and again.
> > You're always filling to the left of the last key.
> >
> > In order to merge, you have to take the table offline.  At least that's
> my understanding. So its not a good thing.
> >
> >
> > On Jul 17, 2012, at 11:08 AM, Ionut Ignatescu wrote:
> >
> >> My usecase: I have several tabels with key starting with a timestamp.
> Also,
> >> this tabels have set data retention to 30 days.
> >> Table size is around 1Tb(3Tb replicated) and data is inserted regular(on
> >> 5minute, ~200Mb is inserted).
> >> File size is set to 1Gb. I have this tables in use for almost half an
> year
> >> and now a table has around 6k partitions and 40% of them are empty.
> >> The problem: the number of regions per region server is now pretty high.
> >> Questions:
> >> Which approach is better?
> >> - to merge adiacent empty partitions in a bigger one?
> >> - to merge empty partitions to non-empty partitions?
> >> Also, I'm wondering why regions merge is not part of major compactions
> and
> >> why it's neccesary to stop the
> >> entire fleet to solve this problem.
> >>
> >>
> >>
> >> Regards,
> >>
> >> Ionut I.
> >>
> >>
> >>
> >
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

Re: How to merge regions in HBase?

Posted by Amandeep Khurana <am...@gmail.com>.
You shouldn't have empty regions. Using timestamp will give you
regions that are always half filled except the last one to which you
are writing the current time range. The moment that'll fill up, split
and you'll again be writing to the last region. How did you end up
with empty regions? Did you pre-split?

On Jul 17, 2012, at 7:15 PM, Michael Segel <mi...@hotmail.com> wrote:

> Find a different row key?
>
> The problem with merging regions is that once you merge the regions, any net new regions will still have the same problem. So you'll have to merge again, and again and again.
> You're always filling to the left of the last key.
>
> In order to merge, you have to take the table offline.  At least that's my understanding. So its not a good thing.
>
>
> On Jul 17, 2012, at 11:08 AM, Ionut Ignatescu wrote:
>
>> My usecase: I have several tabels with key starting with a timestamp. Also,
>> this tabels have set data retention to 30 days.
>> Table size is around 1Tb(3Tb replicated) and data is inserted regular(on
>> 5minute, ~200Mb is inserted).
>> File size is set to 1Gb. I have this tables in use for almost half an year
>> and now a table has around 6k partitions and 40% of them are empty.
>> The problem: the number of regions per region server is now pretty high.
>> Questions:
>> Which approach is better?
>> - to merge adiacent empty partitions in a bigger one?
>> - to merge empty partitions to non-empty partitions?
>> Also, I'm wondering why regions merge is not part of major compactions and
>> why it's neccesary to stop the
>> entire fleet to solve this problem.
>>
>>
>>
>> Regards,
>>
>> Ionut I.
>>
>>
>>
>

Re: How to merge regions in HBase?

Posted by Michael Segel <mi...@hotmail.com>.
Find a different row key? 

The problem with merging regions is that once you merge the regions, any net new regions will still have the same problem. So you'll have to merge again, and again and again.
You're always filling to the left of the last key.

In order to merge, you have to take the table offline.  At least that's my understanding. So its not a good thing. 


On Jul 17, 2012, at 11:08 AM, Ionut Ignatescu wrote:

> My usecase: I have several tabels with key starting with a timestamp. Also,
> this tabels have set data retention to 30 days.
> Table size is around 1Tb(3Tb replicated) and data is inserted regular(on
> 5minute, ~200Mb is inserted).
> File size is set to 1Gb. I have this tables in use for almost half an year
> and now a table has around 6k partitions and 40% of them are empty.
> The problem: the number of regions per region server is now pretty high.
> Questions: 
> Which approach is better?
> - to merge adiacent empty partitions in a bigger one?
> - to merge empty partitions to non-empty partitions?
> Also, I'm wondering why regions merge is not part of major compactions and
> why it's neccesary to stop the 
> entire fleet to solve this problem.
> 
> 
> 
> Regards, 
> 
> Ionut I.
> 
> 
>