You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@phoenix.apache.org by rafa <ra...@gmail.com> on 2017/03/06 16:49:40 UTC

Re: Retiring empty regions

Hi Nick,

We are facing the same issue. Increasingly number of empty regions derived
from TTL and Timestamp in row key.

Did you finally published that scripts? Are they available for public usage?

Thank you very much in advance for your work and help,
Best Regards,
rafa



On Thu, Apr 21, 2016 at 1:48 AM, Andrew Purtell <ap...@apache.org> wrote:

> >  the shell and find the empty ones, another to merge a given region
> into a neighbor. We've run them without incident, looks like it all works
> fine. One thing we did notice is that the AM leaves the old "retired"
> regions around in its counts -- the master status page shows a large number
> of "Other Regions". This was alarming at first,
>
> Good to know. I had seen this recently and had a mental note to circle
> around and confirm it's just a temporary artifact.
>
> On Wed, Apr 20, 2016 at 3:16 PM, Nick Dimiduk <nd...@gmail.com> wrote:
>
>> Circling back here and adding user@phoenix.
>>
>> I put together one script to dump region info from the shell and find the
>> empty ones, another to merge a given region into a neighbor. We've run them
>> without incident, looks like it all works fine. One thing we did notice is
>> that the AM leaves the old "retired" regions around in its counts -- the
>> master status page shows a large number of "Other Regions". This was
>> alarming at first, but we verified it's just an artifact in the AM and in
>> fact these regions are not on HDFS or in meta. Bouncing master resolved it.
>>
>> No one has volunteered any alternative schema designs, so as best we
>> know, this will happen to anyone who has timestamp in their rowkey (ie,
>> anyone using Phoenix's "Row timestamp" feature [0]) and is also using the
>> TTL feature. Are folks interested in adding these scripts to our
>> distribution and our book?
>>
>> -n
>>
>> [0]: https://phoenix.apache.org/rowtimestamp.html
>>
>> On Mon, Apr 4, 2016 at 8:34 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>>
>>> > Crazy idea, but you might be able to take stripped down version of
>>> region
>>> > normalizer code and make a Tool to run? Requesting split or merge is
>>> done
>>> > through the client API, and the only weighing information you need is
>>> > whether region empty or not, that you could find out too?
>>>
>>> Yeah, that's the direction I'm headed.
>>>
>>> > A bit off topic, but I think unfortunately region normalizer now
>>> ignores
>>> > empty regions to avoid undoing pre-split on the table.
>>>
>>> Unfortunate indeed. Maybe we should be keeping around the initial splits
>>> list as a metadata attribute on the table?
>>>
>>> > With a right row-key design you will never have empty regions due to
>>> TTL.
>>>
>>> I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to
>>> write up a post for the blog? Meanwhile, I'm sure of a couple of us on here
>>> on the list would appreciate your Cliff's Notes version. I can take this
>>> into account for my v2 schema design.
>>>
>>> > So Nick, merge on 1.1 is not recommended??? Was working very well on
>>> > previous versions. Is ProcV2 really impact it that bad??
>>>
>>> How to answer here carefully... I have no reason to believe merge is not
>>> working on 1.1. I've been on the wrong end of enough "regions stuck in
>>> transition" support tickets that I'm not keen to put undue stress on my
>>> master. ProcV2 insures against many scenarios that cause master trauma,
>>> hence my interest in the implementation details and my preference for
>>> cluster administration tasks that use it as their source of authority.
>>>
>>> Thanks for the thoughts folks.
>>> -n
>>>
>>> On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari <
>>> jean-marc@spaggiari.org> wrote:
>>>
>>>> ;) That was not the question ;)
>>>>
>>>> So Nick, merge on 1.1 is not recommended??? Was working very well on
>>>> previous versions. Is ProcV2 really impact it that bad??
>>>>
>>>> JMS
>>>>
>>>> 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:
>>>>
>>>> > >> This is something
>>>> > >> which makes it far less useful for time-series databases with
>>>> short TTL
>>>> > on
>>>> > >> the tables.
>>>> >
>>>> > With a right row-key design you will never have empty regions due to
>>>> TTL.
>>>> >
>>>> > -Vlad
>>>> >
>>>> > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov <
>>>> olorinbant@gmail.com>
>>>> > wrote:
>>>> >
>>>> > > Crazy idea, but you might be able to take stripped down version of
>>>> region
>>>> > > normalizer code and make a Tool to run? Requesting split or merge
>>>> is done
>>>> > > through the client API, and the only weighing information you need
>>>> is
>>>> > > whether region empty or not, that you could find out too?
>>>> > >
>>>> > >
>>>> > > "Short of upgrading to 1.2 for the region normalizer,"
>>>> > >
>>>> > > A bit off topic, but I think unfortunately region normalizer now
>>>> ignores
>>>> > > empty regions to avoid undoing pre-split on the table. This is
>>>> something
>>>> > > which makes it far less useful for time-series databases with short
>>>> TTL
>>>> > on
>>>> > > the tables. We'll need to address that.
>>>> > >
>>>> > > -Mikhail
>>>> > >
>>>> > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <nd...@gmail.com>
>>>> > wrote:
>>>> > >
>>>> > > > Hi folks,
>>>> > > >
>>>> > > > I have a table with TTL enabled. It's been receiving data for a
>>>> while
>>>> > > > beyond the TTL and I now have a number of empty regions. I'd like
>>>> to
>>>> > drop
>>>> > > > those empty regions to free up heap space on the region servers
>>>> and
>>>> > > reduce
>>>> > > > master load. I'm running a 1.1 derivative.
>>>> > > >
>>>> > > > The only threads I found on this topic are from circa 0.92
>>>> timeframe.
>>>> > > >
>>>> > > > Short of upgrading to 1.2 for the region normalizer, what's the
>>>> > > recommended
>>>> > > > method of cleaning up this cruft? Should I be merging empty
>>>> regions
>>>> > into
>>>> > > > their neighbor's? Looks like region merge hasn't been migrated to
>>>> > ProcV2
>>>> > > > yet so would be wise to reduce online table activity, or at least
>>>> aim
>>>> > > for a
>>>> > > > "quiet period"? Is there a documented process for off-lining and
>>>> > > deleting a
>>>> > > > region by name? I don't see anything in the book about it.
>>>> > > >
>>>> > > > I experimented with online merge on pseudodist, looks like it's
>>>> working
>>>> > > > fine for the most basic case. I'll probably pursue this unless
>>>> someone
>>>> > > has
>>>> > > > some other ideas.
>>>> > > >
>>>> > > > Thanks,
>>>> > > > Nick
>>>> > > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > --
>>>> > > Thanks,
>>>> > > Michael Antonov
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: Retiring empty regions

Posted by rafa <ra...@gmail.com>.
Hi,

For everyone to know, Nick has published the script for retiring empty
regions in :

https://issues.apache.org/jira/browse/HBASE-15712

Nick, Thank you very much for your help and great work !!!

Best Regards,
Rafa.



On Mon, Mar 6, 2017 at 5:49 PM, rafa <ra...@gmail.com> wrote:

> Hi Nick,
>
> We are facing the same issue. Increasingly number of empty regions derived
> from TTL and Timestamp in row key.
>
> Did you finally published that scripts? Are they available for public
> usage?
>
> Thank you very much in advance for your work and help,
> Best Regards,
> rafa
>
>
>
>
> On Thu, Apr 21, 2016 at 1:48 AM, Andrew Purtell <ap...@apache.org>
> wrote:
>
>> >  the shell and find the empty ones, another to merge a given region
>> into a neighbor. We've run them without incident, looks like it all works
>> fine. One thing we did notice is that the AM leaves the old "retired"
>> regions around in its counts -- the master status page shows a large number
>> of "Other Regions". This was alarming at first,
>>
>> Good to know. I had seen this recently and had a mental note to circle
>> around and confirm it's just a temporary artifact.
>>
>> On Wed, Apr 20, 2016 at 3:16 PM, Nick Dimiduk <nd...@gmail.com> wrote:
>>
>>> Circling back here and adding user@phoenix.
>>>
>>> I put together one script to dump region info from the shell and find
>>> the empty ones, another to merge a given region into a neighbor. We've run
>>> them without incident, looks like it all works fine. One thing we did
>>> notice is that the AM leaves the old "retired" regions around in its counts
>>> -- the master status page shows a large number of "Other Regions". This was
>>> alarming at first, but we verified it's just an artifact in the AM and in
>>> fact these regions are not on HDFS or in meta. Bouncing master resolved it.
>>>
>>> No one has volunteered any alternative schema designs, so as best we
>>> know, this will happen to anyone who has timestamp in their rowkey (ie,
>>> anyone using Phoenix's "Row timestamp" feature [0]) and is also using the
>>> TTL feature. Are folks interested in adding these scripts to our
>>> distribution and our book?
>>>
>>> -n
>>>
>>> [0]: https://phoenix.apache.org/rowtimestamp.html
>>>
>>> On Mon, Apr 4, 2016 at 8:34 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>>>
>>>> > Crazy idea, but you might be able to take stripped down version of
>>>> region
>>>> > normalizer code and make a Tool to run? Requesting split or merge is
>>>> done
>>>> > through the client API, and the only weighing information you need is
>>>> > whether region empty or not, that you could find out too?
>>>>
>>>> Yeah, that's the direction I'm headed.
>>>>
>>>> > A bit off topic, but I think unfortunately region normalizer now
>>>> ignores
>>>> > empty regions to avoid undoing pre-split on the table.
>>>>
>>>> Unfortunate indeed. Maybe we should be keeping around the initial
>>>> splits list as a metadata attribute on the table?
>>>>
>>>> > With a right row-key design you will never have empty regions due to
>>>> TTL.
>>>>
>>>> I'd love to hear your thoughts on this design, Vlad. Maybe you'd like
>>>> to write up a post for the blog? Meanwhile, I'm sure of a couple of us on
>>>> here on the list would appreciate your Cliff's Notes version. I can take
>>>> this into account for my v2 schema design.
>>>>
>>>> > So Nick, merge on 1.1 is not recommended??? Was working very well on
>>>> > previous versions. Is ProcV2 really impact it that bad??
>>>>
>>>> How to answer here carefully... I have no reason to believe merge is
>>>> not working on 1.1. I've been on the wrong end of enough "regions stuck in
>>>> transition" support tickets that I'm not keen to put undue stress on my
>>>> master. ProcV2 insures against many scenarios that cause master trauma,
>>>> hence my interest in the implementation details and my preference for
>>>> cluster administration tasks that use it as their source of authority.
>>>>
>>>> Thanks for the thoughts folks.
>>>> -n
>>>>
>>>> On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari <
>>>> jean-marc@spaggiari.org> wrote:
>>>>
>>>>> ;) That was not the question ;)
>>>>>
>>>>> So Nick, merge on 1.1 is not recommended??? Was working very well on
>>>>> previous versions. Is ProcV2 really impact it that bad??
>>>>>
>>>>> JMS
>>>>>
>>>>> 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:
>>>>>
>>>>> > >> This is something
>>>>> > >> which makes it far less useful for time-series databases with
>>>>> short TTL
>>>>> > on
>>>>> > >> the tables.
>>>>> >
>>>>> > With a right row-key design you will never have empty regions due to
>>>>> TTL.
>>>>> >
>>>>> > -Vlad
>>>>> >
>>>>> > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov <
>>>>> olorinbant@gmail.com>
>>>>> > wrote:
>>>>> >
>>>>> > > Crazy idea, but you might be able to take stripped down version of
>>>>> region
>>>>> > > normalizer code and make a Tool to run? Requesting split or merge
>>>>> is done
>>>>> > > through the client API, and the only weighing information you need
>>>>> is
>>>>> > > whether region empty or not, that you could find out too?
>>>>> > >
>>>>> > >
>>>>> > > "Short of upgrading to 1.2 for the region normalizer,"
>>>>> > >
>>>>> > > A bit off topic, but I think unfortunately region normalizer now
>>>>> ignores
>>>>> > > empty regions to avoid undoing pre-split on the table. This is
>>>>> something
>>>>> > > which makes it far less useful for time-series databases with
>>>>> short TTL
>>>>> > on
>>>>> > > the tables. We'll need to address that.
>>>>> > >
>>>>> > > -Mikhail
>>>>> > >
>>>>> > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <nd...@gmail.com>
>>>>> > wrote:
>>>>> > >
>>>>> > > > Hi folks,
>>>>> > > >
>>>>> > > > I have a table with TTL enabled. It's been receiving data for a
>>>>> while
>>>>> > > > beyond the TTL and I now have a number of empty regions. I'd
>>>>> like to
>>>>> > drop
>>>>> > > > those empty regions to free up heap space on the region servers
>>>>> and
>>>>> > > reduce
>>>>> > > > master load. I'm running a 1.1 derivative.
>>>>> > > >
>>>>> > > > The only threads I found on this topic are from circa 0.92
>>>>> timeframe.
>>>>> > > >
>>>>> > > > Short of upgrading to 1.2 for the region normalizer, what's the
>>>>> > > recommended
>>>>> > > > method of cleaning up this cruft? Should I be merging empty
>>>>> regions
>>>>> > into
>>>>> > > > their neighbor's? Looks like region merge hasn't been migrated to
>>>>> > ProcV2
>>>>> > > > yet so would be wise to reduce online table activity, or at
>>>>> least aim
>>>>> > > for a
>>>>> > > > "quiet period"? Is there a documented process for off-lining and
>>>>> > > deleting a
>>>>> > > > region by name? I don't see anything in the book about it.
>>>>> > > >
>>>>> > > > I experimented with online merge on pseudodist, looks like it's
>>>>> working
>>>>> > > > fine for the most basic case. I'll probably pursue this unless
>>>>> someone
>>>>> > > has
>>>>> > > > some other ideas.
>>>>> > > >
>>>>> > > > Thanks,
>>>>> > > > Nick
>>>>> > > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > --
>>>>> > > Thanks,
>>>>> > > Michael Antonov
>>>>> > >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>>    - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>
>
>