You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Nick Dimiduk <nd...@gmail.com> on 2016/04/01 06:56:26 UTC

Retiring empty regions

Hi folks,

I have a table with TTL enabled. It's been receiving data for a while
beyond the TTL and I now have a number of empty regions. I'd like to drop
those empty regions to free up heap space on the region servers and reduce
master load. I'm running a 1.1 derivative.

The only threads I found on this topic are from circa 0.92 timeframe.

Short of upgrading to 1.2 for the region normalizer, what's the recommended
method of cleaning up this cruft? Should I be merging empty regions into
their neighbor's? Looks like region merge hasn't been migrated to ProcV2
yet so would be wise to reduce online table activity, or at least aim for a
"quiet period"? Is there a documented process for off-lining and deleting a
region by name? I don't see anything in the book about it.

I experimented with online merge on pseudodist, looks like it's working
fine for the most basic case. I'll probably pursue this unless someone has
some other ideas.

Thanks,
Nick

Re: Retiring empty regions

Posted by Nick Dimiduk <nd...@gmail.com>.

I'm looking forward to your talk Vlad.

In the mean time, I filed HBASE-15712. We'll get our implementation posted
up there. We have these deployed on one of the masters, running daily with
cron.

@Mikhail, to get this feature into the normalizer, how about this: let's
add a min number of regions property to user tables. This can be set when
someone creates a table with split points, or maintained manually. The
normalizer can use that as a constraint to guide its convergence.

On Wed, Apr 20, 2016 at 5:18 PM, Vladimir Rodionov <vl...@gmail.com>
wrote:

> >I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to
> >write up a post for the blog? Meanwhile, I'm sure of a couple of us on
> here
> >on the list would appreciate your Cliff's Notes version. I can take this
> >into account for my v2 schema design.
>
> Nick, there will be a presentation on time-series HBase (hbasecon.com)
> Come
> join us :)
>
>
> On Mon, Apr 4, 2016 at 8:34 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>
> > > Crazy idea, but you might be able to take stripped down version of
> region
> > > normalizer code and make a Tool to run? Requesting split or merge is
> done
> > > through the client API, and the only weighing information you need is
> > > whether region empty or not, that you could find out too?
> >
> > Yeah, that's the direction I'm headed.
> >
> > > A bit off topic, but I think unfortunately region normalizer now
> ignores
> > > empty regions to avoid undoing pre-split on the table.
> >
> > Unfortunate indeed. Maybe we should be keeping around the initial splits
> > list as a metadata attribute on the table?
> >
> > > With a right row-key design you will never have empty regions due to
> TTL.
> >
> > I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to
> > write up a post for the blog? Meanwhile, I'm sure of a couple of us on
> here
> > on the list would appreciate your Cliff's Notes version. I can take this
> > into account for my v2 schema design.
> >
> > > So Nick, merge on 1.1 is not recommended??? Was working very well on
> > > previous versions. Is ProcV2 really impact it that bad??
> >
> > How to answer here carefully... I have no reason to believe merge is not
> > working on 1.1. I've been on the wrong end of enough "regions stuck in
> > transition" support tickets that I'm not keen to put undue stress on my
> > master. ProcV2 insures against many scenarios that cause master trauma,
> > hence my interest in the implementation details and my preference for
> > cluster administration tasks that use it as their source of authority.
> >
> > Thanks for the thoughts folks.
> > -n
> >
> > On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > ;) That was not the question ;)
> > >
> > > So Nick, merge on 1.1 is not recommended??? Was working very well on
> > > previous versions. Is ProcV2 really impact it that bad??
> > >
> > > JMS
> > >
> > > 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:
> > >
> > > > >> This is something
> > > > >> which makes it far less useful for time-series databases with
> short
> > > TTL
> > > > on
> > > > >> the tables.
> > > >
> > > > With a right row-key design you will never have empty regions due to
> > TTL.
> > > >
> > > > -Vlad
> > > >
> > > > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov <
> > olorinbant@gmail.com>
> > > > wrote:
> > > >
> > > > > Crazy idea, but you might be able to take stripped down version of
> > > region
> > > > > normalizer code and make a Tool to run? Requesting split or merge
> is
> > > done
> > > > > through the client API, and the only weighing information you need
> is
> > > > > whether region empty or not, that you could find out too?
> > > > >
> > > > >
> > > > > "Short of upgrading to 1.2 for the region normalizer,"
> > > > >
> > > > > A bit off topic, but I think unfortunately region normalizer now
> > > ignores
> > > > > empty regions to avoid undoing pre-split on the table. This is
> > > something
> > > > > which makes it far less useful for time-series databases with short
> > TTL
> > > > on
> > > > > the tables. We'll need to address that.
> > > > >
> > > > > -Mikhail
> > > > >
> > > > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <nd...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi folks,
> > > > > >
> > > > > > I have a table with TTL enabled. It's been receiving data for a
> > while
> > > > > > beyond the TTL and I now have a number of empty regions. I'd like
> > to
> > > > drop
> > > > > > those empty regions to free up heap space on the region servers
> and
> > > > > reduce
> > > > > > master load. I'm running a 1.1 derivative.
> > > > > >
> > > > > > The only threads I found on this topic are from circa 0.92
> > timeframe.
> > > > > >
> > > > > > Short of upgrading to 1.2 for the region normalizer, what's the
> > > > > recommended
> > > > > > method of cleaning up this cruft? Should I be merging empty
> regions
> > > > into
> > > > > > their neighbor's? Looks like region merge hasn't been migrated to
> > > > ProcV2
> > > > > > yet so would be wise to reduce online table activity, or at least
> > aim
> > > > > for a
> > > > > > "quiet period"? Is there a documented process for off-lining and
> > > > > deleting a
> > > > > > region by name? I don't see anything in the book about it.
> > > > > >
> > > > > > I experimented with online merge on pseudodist, looks like it's
> > > working
> > > > > > fine for the most basic case. I'll probably pursue this unless
> > > someone
> > > > > has
> > > > > > some other ideas.
> > > > > >
> > > > > > Thanks,
> > > > > > Nick
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Michael Antonov
> > > > >
> > > >
> > >
> >
>

Re: Retiring empty regions

Posted by Vladimir Rodionov <vl...@gmail.com>.

>I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to
>write up a post for the blog? Meanwhile, I'm sure of a couple of us on here
>on the list would appreciate your Cliff's Notes version. I can take this
>into account for my v2 schema design.

Nick, there will be a presentation on time-series HBase (hbasecon.com) Come
join us :)


On Mon, Apr 4, 2016 at 8:34 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> > Crazy idea, but you might be able to take stripped down version of region
> > normalizer code and make a Tool to run? Requesting split or merge is done
> > through the client API, and the only weighing information you need is
> > whether region empty or not, that you could find out too?
>
> Yeah, that's the direction I'm headed.
>
> > A bit off topic, but I think unfortunately region normalizer now ignores
> > empty regions to avoid undoing pre-split on the table.
>
> Unfortunate indeed. Maybe we should be keeping around the initial splits
> list as a metadata attribute on the table?
>
> > With a right row-key design you will never have empty regions due to TTL.
>
> I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to
> write up a post for the blog? Meanwhile, I'm sure of a couple of us on here
> on the list would appreciate your Cliff's Notes version. I can take this
> into account for my v2 schema design.
>
> > So Nick, merge on 1.1 is not recommended??? Was working very well on
> > previous versions. Is ProcV2 really impact it that bad??
>
> How to answer here carefully... I have no reason to believe merge is not
> working on 1.1. I've been on the wrong end of enough "regions stuck in
> transition" support tickets that I'm not keen to put undue stress on my
> master. ProcV2 insures against many scenarios that cause master trauma,
> hence my interest in the implementation details and my preference for
> cluster administration tasks that use it as their source of authority.
>
> Thanks for the thoughts folks.
> -n
>
> On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > ;) That was not the question ;)
> >
> > So Nick, merge on 1.1 is not recommended??? Was working very well on
> > previous versions. Is ProcV2 really impact it that bad??
> >
> > JMS
> >
> > 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:
> >
> > > >> This is something
> > > >> which makes it far less useful for time-series databases with short
> > TTL
> > > on
> > > >> the tables.
> > >
> > > With a right row-key design you will never have empty regions due to
> TTL.
> > >
> > > -Vlad
> > >
> > > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov <
> olorinbant@gmail.com>
> > > wrote:
> > >
> > > > Crazy idea, but you might be able to take stripped down version of
> > region
> > > > normalizer code and make a Tool to run? Requesting split or merge is
> > done
> > > > through the client API, and the only weighing information you need is
> > > > whether region empty or not, that you could find out too?
> > > >
> > > >
> > > > "Short of upgrading to 1.2 for the region normalizer,"
> > > >
> > > > A bit off topic, but I think unfortunately region normalizer now
> > ignores
> > > > empty regions to avoid undoing pre-split on the table. This is
> > something
> > > > which makes it far less useful for time-series databases with short
> TTL
> > > on
> > > > the tables. We'll need to address that.
> > > >
> > > > -Mikhail
> > > >
> > > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <nd...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > I have a table with TTL enabled. It's been receiving data for a
> while
> > > > > beyond the TTL and I now have a number of empty regions. I'd like
> to
> > > drop
> > > > > those empty regions to free up heap space on the region servers and
> > > > reduce
> > > > > master load. I'm running a 1.1 derivative.
> > > > >
> > > > > The only threads I found on this topic are from circa 0.92
> timeframe.
> > > > >
> > > > > Short of upgrading to 1.2 for the region normalizer, what's the
> > > > recommended
> > > > > method of cleaning up this cruft? Should I be merging empty regions
> > > into
> > > > > their neighbor's? Looks like region merge hasn't been migrated to
> > > ProcV2
> > > > > yet so would be wise to reduce online table activity, or at least
> aim
> > > > for a
> > > > > "quiet period"? Is there a documented process for off-lining and
> > > > deleting a
> > > > > region by name? I don't see anything in the book about it.
> > > > >
> > > > > I experimented with online merge on pseudodist, looks like it's
> > working
> > > > > fine for the most basic case. I'll probably pursue this unless
> > someone
> > > > has
> > > > > some other ideas.
> > > > >
> > > > > Thanks,
> > > > > Nick
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Michael Antonov
> > > >
> > >
> >
>

Retiring empty regions

Posted by Mikhail Antonov <ol...@gmail.com>.

Yeah, that sound interesting.

Do you think it should be a script (command, runnable from the client
side), or some chore on master?

You're going on this route because region normalizer lacks features
you guys need?

-Mikhail

> Circling back here and adding user@phoenix. I put together one script to dump region info from the shell and find the empty ones, another to merge a given region into a neighbor. We've run them without incident, looks like it all works fine. One thing we did notice is that the AM leaves the old "retired" regions around in its counts -- the master status page shows a large number of "Other Regions". This was alarming at first, but we verified it's just an artifact in the AM and in fact these regions are not on HDFS or in meta. Bouncing master resolved it. No one has volunteered any alternative schema designs, so as best we know, this will happen to anyone who has timestamp in their rowkey (ie, anyone using Phoenix's "Row timestamp" feature [0]) and is also using the TTL feature. Are folks interested in adding these scripts to our distribution and our book? -n [0]: https://phoenix.apache.org/rowtimestamp.html

Re: Retiring empty regions

Posted by rafa <ra...@gmail.com>.

Hi,

For everyone to know, Nick has published the script for retiring empty
regions in :

https://issues.apache.org/jira/browse/HBASE-15712

Nick, Thank you very much for your help and great work !!!

Best Regards,
Rafa.



On Mon, Mar 6, 2017 at 5:49 PM, rafa <ra...@gmail.com> wrote:

> Hi Nick,
>
> We are facing the same issue. Increasingly number of empty regions derived
> from TTL and Timestamp in row key.
>
> Did you finally published that scripts? Are they available for public
> usage?
>
> Thank you very much in advance for your work and help,
> Best Regards,
> rafa
>
>
>
>
> On Thu, Apr 21, 2016 at 1:48 AM, Andrew Purtell <ap...@apache.org>
> wrote:
>
>> >  the shell and find the empty ones, another to merge a given region
>> into a neighbor. We've run them without incident, looks like it all works
>> fine. One thing we did notice is that the AM leaves the old "retired"
>> regions around in its counts -- the master status page shows a large number
>> of "Other Regions". This was alarming at first,
>>
>> Good to know. I had seen this recently and had a mental note to circle
>> around and confirm it's just a temporary artifact.
>>
>> On Wed, Apr 20, 2016 at 3:16 PM, Nick Dimiduk <nd...@gmail.com> wrote:
>>
>>> Circling back here and adding user@phoenix.
>>>
>>> I put together one script to dump region info from the shell and find
>>> the empty ones, another to merge a given region into a neighbor. We've run
>>> them without incident, looks like it all works fine. One thing we did
>>> notice is that the AM leaves the old "retired" regions around in its counts
>>> -- the master status page shows a large number of "Other Regions". This was
>>> alarming at first, but we verified it's just an artifact in the AM and in
>>> fact these regions are not on HDFS or in meta. Bouncing master resolved it.
>>>
>>> No one has volunteered any alternative schema designs, so as best we
>>> know, this will happen to anyone who has timestamp in their rowkey (ie,
>>> anyone using Phoenix's "Row timestamp" feature [0]) and is also using the
>>> TTL feature. Are folks interested in adding these scripts to our
>>> distribution and our book?
>>>
>>> -n
>>>
>>> [0]: https://phoenix.apache.org/rowtimestamp.html
>>>
>>> On Mon, Apr 4, 2016 at 8:34 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>>>
>>>> > Crazy idea, but you might be able to take stripped down version of
>>>> region
>>>> > normalizer code and make a Tool to run? Requesting split or merge is
>>>> done
>>>> > through the client API, and the only weighing information you need is
>>>> > whether region empty or not, that you could find out too?
>>>>
>>>> Yeah, that's the direction I'm headed.
>>>>
>>>> > A bit off topic, but I think unfortunately region normalizer now
>>>> ignores
>>>> > empty regions to avoid undoing pre-split on the table.
>>>>
>>>> Unfortunate indeed. Maybe we should be keeping around the initial
>>>> splits list as a metadata attribute on the table?
>>>>
>>>> > With a right row-key design you will never have empty regions due to
>>>> TTL.
>>>>
>>>> I'd love to hear your thoughts on this design, Vlad. Maybe you'd like
>>>> to write up a post for the blog? Meanwhile, I'm sure of a couple of us on
>>>> here on the list would appreciate your Cliff's Notes version. I can take
>>>> this into account for my v2 schema design.
>>>>
>>>> > So Nick, merge on 1.1 is not recommended??? Was working very well on
>>>> > previous versions. Is ProcV2 really impact it that bad??
>>>>
>>>> How to answer here carefully... I have no reason to believe merge is
>>>> not working on 1.1. I've been on the wrong end of enough "regions stuck in
>>>> transition" support tickets that I'm not keen to put undue stress on my
>>>> master. ProcV2 insures against many scenarios that cause master trauma,
>>>> hence my interest in the implementation details and my preference for
>>>> cluster administration tasks that use it as their source of authority.
>>>>
>>>> Thanks for the thoughts folks.
>>>> -n
>>>>
>>>> On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari <
>>>> jean-marc@spaggiari.org> wrote:
>>>>
>>>>> ;) That was not the question ;)
>>>>>
>>>>> So Nick, merge on 1.1 is not recommended??? Was working very well on
>>>>> previous versions. Is ProcV2 really impact it that bad??
>>>>>
>>>>> JMS
>>>>>
>>>>> 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:
>>>>>
>>>>> > >> This is something
>>>>> > >> which makes it far less useful for time-series databases with
>>>>> short TTL
>>>>> > on
>>>>> > >> the tables.
>>>>> >
>>>>> > With a right row-key design you will never have empty regions due to
>>>>> TTL.
>>>>> >
>>>>> > -Vlad
>>>>> >
>>>>> > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov <
>>>>> olorinbant@gmail.com>
>>>>> > wrote:
>>>>> >
>>>>> > > Crazy idea, but you might be able to take stripped down version of
>>>>> region
>>>>> > > normalizer code and make a Tool to run? Requesting split or merge
>>>>> is done
>>>>> > > through the client API, and the only weighing information you need
>>>>> is
>>>>> > > whether region empty or not, that you could find out too?
>>>>> > >
>>>>> > >
>>>>> > > "Short of upgrading to 1.2 for the region normalizer,"
>>>>> > >
>>>>> > > A bit off topic, but I think unfortunately region normalizer now
>>>>> ignores
>>>>> > > empty regions to avoid undoing pre-split on the table. This is
>>>>> something
>>>>> > > which makes it far less useful for time-series databases with
>>>>> short TTL
>>>>> > on
>>>>> > > the tables. We'll need to address that.
>>>>> > >
>>>>> > > -Mikhail
>>>>> > >
>>>>> > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <nd...@gmail.com>
>>>>> > wrote:
>>>>> > >
>>>>> > > > Hi folks,
>>>>> > > >
>>>>> > > > I have a table with TTL enabled. It's been receiving data for a
>>>>> while
>>>>> > > > beyond the TTL and I now have a number of empty regions. I'd
>>>>> like to
>>>>> > drop
>>>>> > > > those empty regions to free up heap space on the region servers
>>>>> and
>>>>> > > reduce
>>>>> > > > master load. I'm running a 1.1 derivative.
>>>>> > > >
>>>>> > > > The only threads I found on this topic are from circa 0.92
>>>>> timeframe.
>>>>> > > >
>>>>> > > > Short of upgrading to 1.2 for the region normalizer, what's the
>>>>> > > recommended
>>>>> > > > method of cleaning up this cruft? Should I be merging empty
>>>>> regions
>>>>> > into
>>>>> > > > their neighbor's? Looks like region merge hasn't been migrated to
>>>>> > ProcV2
>>>>> > > > yet so would be wise to reduce online table activity, or at
>>>>> least aim
>>>>> > > for a
>>>>> > > > "quiet period"? Is there a documented process for off-lining and
>>>>> > > deleting a
>>>>> > > > region by name? I don't see anything in the book about it.
>>>>> > > >
>>>>> > > > I experimented with online merge on pseudodist, looks like it's
>>>>> working
>>>>> > > > fine for the most basic case. I'll probably pursue this unless
>>>>> someone
>>>>> > > has
>>>>> > > > some other ideas.
>>>>> > > >
>>>>> > > > Thanks,
>>>>> > > > Nick
>>>>> > > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > --
>>>>> > > Thanks,
>>>>> > > Michael Antonov
>>>>> > >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>>    - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>
>
>

Re: Retiring empty regions

Posted by rafa <ra...@gmail.com>.

Hi Nick,

We are facing the same issue. Increasingly number of empty regions derived
from TTL and Timestamp in row key.

Did you finally published that scripts? Are they available for public usage?

Thank you very much in advance for your work and help,
Best Regards,
rafa



On Thu, Apr 21, 2016 at 1:48 AM, Andrew Purtell <ap...@apache.org> wrote:

> >  the shell and find the empty ones, another to merge a given region
> into a neighbor. We've run them without incident, looks like it all works
> fine. One thing we did notice is that the AM leaves the old "retired"
> regions around in its counts -- the master status page shows a large number
> of "Other Regions". This was alarming at first,
>
> Good to know. I had seen this recently and had a mental note to circle
> around and confirm it's just a temporary artifact.
>
> On Wed, Apr 20, 2016 at 3:16 PM, Nick Dimiduk <nd...@gmail.com> wrote:
>
>> Circling back here and adding user@phoenix.
>>
>> I put together one script to dump region info from the shell and find the
>> empty ones, another to merge a given region into a neighbor. We've run them
>> without incident, looks like it all works fine. One thing we did notice is
>> that the AM leaves the old "retired" regions around in its counts -- the
>> master status page shows a large number of "Other Regions". This was
>> alarming at first, but we verified it's just an artifact in the AM and in
>> fact these regions are not on HDFS or in meta. Bouncing master resolved it.
>>
>> No one has volunteered any alternative schema designs, so as best we
>> know, this will happen to anyone who has timestamp in their rowkey (ie,
>> anyone using Phoenix's "Row timestamp" feature [0]) and is also using the
>> TTL feature. Are folks interested in adding these scripts to our
>> distribution and our book?
>>
>> -n
>>
>> [0]: https://phoenix.apache.org/rowtimestamp.html
>>
>> On Mon, Apr 4, 2016 at 8:34 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>>
>>> > Crazy idea, but you might be able to take stripped down version of
>>> region
>>> > normalizer code and make a Tool to run? Requesting split or merge is
>>> done
>>> > through the client API, and the only weighing information you need is
>>> > whether region empty or not, that you could find out too?
>>>
>>> Yeah, that's the direction I'm headed.
>>>
>>> > A bit off topic, but I think unfortunately region normalizer now
>>> ignores
>>> > empty regions to avoid undoing pre-split on the table.
>>>
>>> Unfortunate indeed. Maybe we should be keeping around the initial splits
>>> list as a metadata attribute on the table?
>>>
>>> > With a right row-key design you will never have empty regions due to
>>> TTL.
>>>
>>> I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to
>>> write up a post for the blog? Meanwhile, I'm sure of a couple of us on here
>>> on the list would appreciate your Cliff's Notes version. I can take this
>>> into account for my v2 schema design.
>>>
>>> > So Nick, merge on 1.1 is not recommended??? Was working very well on
>>> > previous versions. Is ProcV2 really impact it that bad??
>>>
>>> How to answer here carefully... I have no reason to believe merge is not
>>> working on 1.1. I've been on the wrong end of enough "regions stuck in
>>> transition" support tickets that I'm not keen to put undue stress on my
>>> master. ProcV2 insures against many scenarios that cause master trauma,
>>> hence my interest in the implementation details and my preference for
>>> cluster administration tasks that use it as their source of authority.
>>>
>>> Thanks for the thoughts folks.
>>> -n
>>>
>>> On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari <
>>> jean-marc@spaggiari.org> wrote:
>>>
>>>> ;) That was not the question ;)
>>>>
>>>> So Nick, merge on 1.1 is not recommended??? Was working very well on
>>>> previous versions. Is ProcV2 really impact it that bad??
>>>>
>>>> JMS
>>>>
>>>> 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:
>>>>
>>>> > >> This is something
>>>> > >> which makes it far less useful for time-series databases with
>>>> short TTL
>>>> > on
>>>> > >> the tables.
>>>> >
>>>> > With a right row-key design you will never have empty regions due to
>>>> TTL.
>>>> >
>>>> > -Vlad
>>>> >
>>>> > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov <
>>>> olorinbant@gmail.com>
>>>> > wrote:
>>>> >
>>>> > > Crazy idea, but you might be able to take stripped down version of
>>>> region
>>>> > > normalizer code and make a Tool to run? Requesting split or merge
>>>> is done
>>>> > > through the client API, and the only weighing information you need
>>>> is
>>>> > > whether region empty or not, that you could find out too?
>>>> > >
>>>> > >
>>>> > > "Short of upgrading to 1.2 for the region normalizer,"
>>>> > >
>>>> > > A bit off topic, but I think unfortunately region normalizer now
>>>> ignores
>>>> > > empty regions to avoid undoing pre-split on the table. This is
>>>> something
>>>> > > which makes it far less useful for time-series databases with short
>>>> TTL
>>>> > on
>>>> > > the tables. We'll need to address that.
>>>> > >
>>>> > > -Mikhail
>>>> > >
>>>> > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <nd...@gmail.com>
>>>> > wrote:
>>>> > >
>>>> > > > Hi folks,
>>>> > > >
>>>> > > > I have a table with TTL enabled. It's been receiving data for a
>>>> while
>>>> > > > beyond the TTL and I now have a number of empty regions. I'd like
>>>> to
>>>> > drop
>>>> > > > those empty regions to free up heap space on the region servers
>>>> and
>>>> > > reduce
>>>> > > > master load. I'm running a 1.1 derivative.
>>>> > > >
>>>> > > > The only threads I found on this topic are from circa 0.92
>>>> timeframe.
>>>> > > >
>>>> > > > Short of upgrading to 1.2 for the region normalizer, what's the
>>>> > > recommended
>>>> > > > method of cleaning up this cruft? Should I be merging empty
>>>> regions
>>>> > into
>>>> > > > their neighbor's? Looks like region merge hasn't been migrated to
>>>> > ProcV2
>>>> > > > yet so would be wise to reduce online table activity, or at least
>>>> aim
>>>> > > for a
>>>> > > > "quiet period"? Is there a documented process for off-lining and
>>>> > > deleting a
>>>> > > > region by name? I don't see anything in the book about it.
>>>> > > >
>>>> > > > I experimented with online merge on pseudodist, looks like it's
>>>> working
>>>> > > > fine for the most basic case. I'll probably pursue this unless
>>>> someone
>>>> > > has
>>>> > > > some other ideas.
>>>> > > >
>>>> > > > Thanks,
>>>> > > > Nick
>>>> > > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > --
>>>> > > Thanks,
>>>> > > Michael Antonov
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: Retiring empty regions

Posted by rafa <ra...@gmail.com>.

Hi Nick,

We are facing the same issue. Increasingly number of empty regions derived
from TTL and Timestamp in row key.

Did you finally published that scripts? Are they available for public usage?

Thank you very much in advance for your work and help,
Best Regards,
rafa



On Thu, Apr 21, 2016 at 1:48 AM, Andrew Purtell <ap...@apache.org> wrote:

> >  the shell and find the empty ones, another to merge a given region
> into a neighbor. We've run them without incident, looks like it all works
> fine. One thing we did notice is that the AM leaves the old "retired"
> regions around in its counts -- the master status page shows a large number
> of "Other Regions". This was alarming at first,
>
> Good to know. I had seen this recently and had a mental note to circle
> around and confirm it's just a temporary artifact.
>
> On Wed, Apr 20, 2016 at 3:16 PM, Nick Dimiduk <nd...@gmail.com> wrote:
>
>> Circling back here and adding user@phoenix.
>>
>> I put together one script to dump region info from the shell and find the
>> empty ones, another to merge a given region into a neighbor. We've run them
>> without incident, looks like it all works fine. One thing we did notice is
>> that the AM leaves the old "retired" regions around in its counts -- the
>> master status page shows a large number of "Other Regions". This was
>> alarming at first, but we verified it's just an artifact in the AM and in
>> fact these regions are not on HDFS or in meta. Bouncing master resolved it.
>>
>> No one has volunteered any alternative schema designs, so as best we
>> know, this will happen to anyone who has timestamp in their rowkey (ie,
>> anyone using Phoenix's "Row timestamp" feature [0]) and is also using the
>> TTL feature. Are folks interested in adding these scripts to our
>> distribution and our book?
>>
>> -n
>>
>> [0]: https://phoenix.apache.org/rowtimestamp.html
>>
>> On Mon, Apr 4, 2016 at 8:34 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>>
>>> > Crazy idea, but you might be able to take stripped down version of
>>> region
>>> > normalizer code and make a Tool to run? Requesting split or merge is
>>> done
>>> > through the client API, and the only weighing information you need is
>>> > whether region empty or not, that you could find out too?
>>>
>>> Yeah, that's the direction I'm headed.
>>>
>>> > A bit off topic, but I think unfortunately region normalizer now
>>> ignores
>>> > empty regions to avoid undoing pre-split on the table.
>>>
>>> Unfortunate indeed. Maybe we should be keeping around the initial splits
>>> list as a metadata attribute on the table?
>>>
>>> > With a right row-key design you will never have empty regions due to
>>> TTL.
>>>
>>> I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to
>>> write up a post for the blog? Meanwhile, I'm sure of a couple of us on here
>>> on the list would appreciate your Cliff's Notes version. I can take this
>>> into account for my v2 schema design.
>>>
>>> > So Nick, merge on 1.1 is not recommended??? Was working very well on
>>> > previous versions. Is ProcV2 really impact it that bad??
>>>
>>> How to answer here carefully... I have no reason to believe merge is not
>>> working on 1.1. I've been on the wrong end of enough "regions stuck in
>>> transition" support tickets that I'm not keen to put undue stress on my
>>> master. ProcV2 insures against many scenarios that cause master trauma,
>>> hence my interest in the implementation details and my preference for
>>> cluster administration tasks that use it as their source of authority.
>>>
>>> Thanks for the thoughts folks.
>>> -n
>>>
>>> On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari <
>>> jean-marc@spaggiari.org> wrote:
>>>
>>>> ;) That was not the question ;)
>>>>
>>>> So Nick, merge on 1.1 is not recommended??? Was working very well on
>>>> previous versions. Is ProcV2 really impact it that bad??
>>>>
>>>> JMS
>>>>
>>>> 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:
>>>>
>>>> > >> This is something
>>>> > >> which makes it far less useful for time-series databases with
>>>> short TTL
>>>> > on
>>>> > >> the tables.
>>>> >
>>>> > With a right row-key design you will never have empty regions due to
>>>> TTL.
>>>> >
>>>> > -Vlad
>>>> >
>>>> > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov <
>>>> olorinbant@gmail.com>
>>>> > wrote:
>>>> >
>>>> > > Crazy idea, but you might be able to take stripped down version of
>>>> region
>>>> > > normalizer code and make a Tool to run? Requesting split or merge
>>>> is done
>>>> > > through the client API, and the only weighing information you need
>>>> is
>>>> > > whether region empty or not, that you could find out too?
>>>> > >
>>>> > >
>>>> > > "Short of upgrading to 1.2 for the region normalizer,"
>>>> > >
>>>> > > A bit off topic, but I think unfortunately region normalizer now
>>>> ignores
>>>> > > empty regions to avoid undoing pre-split on the table. This is
>>>> something
>>>> > > which makes it far less useful for time-series databases with short
>>>> TTL
>>>> > on
>>>> > > the tables. We'll need to address that.
>>>> > >
>>>> > > -Mikhail
>>>> > >
>>>> > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <nd...@gmail.com>
>>>> > wrote:
>>>> > >
>>>> > > > Hi folks,
>>>> > > >
>>>> > > > I have a table with TTL enabled. It's been receiving data for a
>>>> while
>>>> > > > beyond the TTL and I now have a number of empty regions. I'd like
>>>> to
>>>> > drop
>>>> > > > those empty regions to free up heap space on the region servers
>>>> and
>>>> > > reduce
>>>> > > > master load. I'm running a 1.1 derivative.
>>>> > > >
>>>> > > > The only threads I found on this topic are from circa 0.92
>>>> timeframe.
>>>> > > >
>>>> > > > Short of upgrading to 1.2 for the region normalizer, what's the
>>>> > > recommended
>>>> > > > method of cleaning up this cruft? Should I be merging empty
>>>> regions
>>>> > into
>>>> > > > their neighbor's? Looks like region merge hasn't been migrated to
>>>> > ProcV2
>>>> > > > yet so would be wise to reduce online table activity, or at least
>>>> aim
>>>> > > for a
>>>> > > > "quiet period"? Is there a documented process for off-lining and
>>>> > > deleting a
>>>> > > > region by name? I don't see anything in the book about it.
>>>> > > >
>>>> > > > I experimented with online merge on pseudodist, looks like it's
>>>> working
>>>> > > > fine for the most basic case. I'll probably pursue this unless
>>>> someone
>>>> > > has
>>>> > > > some other ideas.
>>>> > > >
>>>> > > > Thanks,
>>>> > > > Nick
>>>> > > >
>>>> > >
>>>> > >
>>>> > >
>>>> > > --
>>>> > > Thanks,
>>>> > > Michael Antonov
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: Retiring empty regions

Posted by Andrew Purtell <ap...@apache.org>.

>  the shell and find the empty ones, another to merge a given region into
a neighbor. We've run them without incident, looks like it all works fine.
One thing we did notice is that the AM leaves the old "retired" regions
around in its counts -- the master status page shows a large number of
"Other Regions". This was alarming at first,

Good to know. I had seen this recently and had a mental note to circle
around and confirm it's just a temporary artifact.

On Wed, Apr 20, 2016 at 3:16 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> Circling back here and adding user@phoenix.
>
> I put together one script to dump region info from the shell and find the
> empty ones, another to merge a given region into a neighbor. We've run them
> without incident, looks like it all works fine. One thing we did notice is
> that the AM leaves the old "retired" regions around in its counts -- the
> master status page shows a large number of "Other Regions". This was
> alarming at first, but we verified it's just an artifact in the AM and in
> fact these regions are not on HDFS or in meta. Bouncing master resolved it.
>
> No one has volunteered any alternative schema designs, so as best we know,
> this will happen to anyone who has timestamp in their rowkey (ie, anyone
> using Phoenix's "Row timestamp" feature [0]) and is also using the TTL
> feature. Are folks interested in adding these scripts to our distribution
> and our book?
>
> -n
>
> [0]: https://phoenix.apache.org/rowtimestamp.html
>
> On Mon, Apr 4, 2016 at 8:34 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>
>> > Crazy idea, but you might be able to take stripped down version of
>> region
>> > normalizer code and make a Tool to run? Requesting split or merge is
>> done
>> > through the client API, and the only weighing information you need is
>> > whether region empty or not, that you could find out too?
>>
>> Yeah, that's the direction I'm headed.
>>
>> > A bit off topic, but I think unfortunately region normalizer now ignores
>> > empty regions to avoid undoing pre-split on the table.
>>
>> Unfortunate indeed. Maybe we should be keeping around the initial splits
>> list as a metadata attribute on the table?
>>
>> > With a right row-key design you will never have empty regions due to
>> TTL.
>>
>> I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to
>> write up a post for the blog? Meanwhile, I'm sure of a couple of us on here
>> on the list would appreciate your Cliff's Notes version. I can take this
>> into account for my v2 schema design.
>>
>> > So Nick, merge on 1.1 is not recommended??? Was working very well on
>> > previous versions. Is ProcV2 really impact it that bad??
>>
>> How to answer here carefully... I have no reason to believe merge is not
>> working on 1.1. I've been on the wrong end of enough "regions stuck in
>> transition" support tickets that I'm not keen to put undue stress on my
>> master. ProcV2 insures against many scenarios that cause master trauma,
>> hence my interest in the implementation details and my preference for
>> cluster administration tasks that use it as their source of authority.
>>
>> Thanks for the thoughts folks.
>> -n
>>
>> On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari <
>> jean-marc@spaggiari.org> wrote:
>>
>>> ;) That was not the question ;)
>>>
>>> So Nick, merge on 1.1 is not recommended??? Was working very well on
>>> previous versions. Is ProcV2 really impact it that bad??
>>>
>>> JMS
>>>
>>> 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:
>>>
>>> > >> This is something
>>> > >> which makes it far less useful for time-series databases with short
>>> TTL
>>> > on
>>> > >> the tables.
>>> >
>>> > With a right row-key design you will never have empty regions due to
>>> TTL.
>>> >
>>> > -Vlad
>>> >
>>> > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov <
>>> olorinbant@gmail.com>
>>> > wrote:
>>> >
>>> > > Crazy idea, but you might be able to take stripped down version of
>>> region
>>> > > normalizer code and make a Tool to run? Requesting split or merge is
>>> done
>>> > > through the client API, and the only weighing information you need is
>>> > > whether region empty or not, that you could find out too?
>>> > >
>>> > >
>>> > > "Short of upgrading to 1.2 for the region normalizer,"
>>> > >
>>> > > A bit off topic, but I think unfortunately region normalizer now
>>> ignores
>>> > > empty regions to avoid undoing pre-split on the table. This is
>>> something
>>> > > which makes it far less useful for time-series databases with short
>>> TTL
>>> > on
>>> > > the tables. We'll need to address that.
>>> > >
>>> > > -Mikhail
>>> > >
>>> > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <nd...@gmail.com>
>>> > wrote:
>>> > >
>>> > > > Hi folks,
>>> > > >
>>> > > > I have a table with TTL enabled. It's been receiving data for a
>>> while
>>> > > > beyond the TTL and I now have a number of empty regions. I'd like
>>> to
>>> > drop
>>> > > > those empty regions to free up heap space on the region servers and
>>> > > reduce
>>> > > > master load. I'm running a 1.1 derivative.
>>> > > >
>>> > > > The only threads I found on this topic are from circa 0.92
>>> timeframe.
>>> > > >
>>> > > > Short of upgrading to 1.2 for the region normalizer, what's the
>>> > > recommended
>>> > > > method of cleaning up this cruft? Should I be merging empty regions
>>> > into
>>> > > > their neighbor's? Looks like region merge hasn't been migrated to
>>> > ProcV2
>>> > > > yet so would be wise to reduce online table activity, or at least
>>> aim
>>> > > for a
>>> > > > "quiet period"? Is there a documented process for off-lining and
>>> > > deleting a
>>> > > > region by name? I don't see anything in the book about it.
>>> > > >
>>> > > > I experimented with online merge on pseudodist, looks like it's
>>> working
>>> > > > fine for the most basic case. I'll probably pursue this unless
>>> someone
>>> > > has
>>> > > > some other ideas.
>>> > > >
>>> > > > Thanks,
>>> > > > Nick
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Thanks,
>>> > > Michael Antonov
>>> > >
>>> >
>>>
>>
>>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Retiring empty regions

Posted by Andrew Purtell <ap...@apache.org>.

>  the shell and find the empty ones, another to merge a given region into
a neighbor. We've run them without incident, looks like it all works fine.
One thing we did notice is that the AM leaves the old "retired" regions
around in its counts -- the master status page shows a large number of
"Other Regions". This was alarming at first,

Good to know. I had seen this recently and had a mental note to circle
around and confirm it's just a temporary artifact.

On Wed, Apr 20, 2016 at 3:16 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> Circling back here and adding user@phoenix.
>
> I put together one script to dump region info from the shell and find the
> empty ones, another to merge a given region into a neighbor. We've run them
> without incident, looks like it all works fine. One thing we did notice is
> that the AM leaves the old "retired" regions around in its counts -- the
> master status page shows a large number of "Other Regions". This was
> alarming at first, but we verified it's just an artifact in the AM and in
> fact these regions are not on HDFS or in meta. Bouncing master resolved it.
>
> No one has volunteered any alternative schema designs, so as best we know,
> this will happen to anyone who has timestamp in their rowkey (ie, anyone
> using Phoenix's "Row timestamp" feature [0]) and is also using the TTL
> feature. Are folks interested in adding these scripts to our distribution
> and our book?
>
> -n
>
> [0]: https://phoenix.apache.org/rowtimestamp.html
>
> On Mon, Apr 4, 2016 at 8:34 AM, Nick Dimiduk <nd...@gmail.com> wrote:
>
>> > Crazy idea, but you might be able to take stripped down version of
>> region
>> > normalizer code and make a Tool to run? Requesting split or merge is
>> done
>> > through the client API, and the only weighing information you need is
>> > whether region empty or not, that you could find out too?
>>
>> Yeah, that's the direction I'm headed.
>>
>> > A bit off topic, but I think unfortunately region normalizer now ignores
>> > empty regions to avoid undoing pre-split on the table.
>>
>> Unfortunate indeed. Maybe we should be keeping around the initial splits
>> list as a metadata attribute on the table?
>>
>> > With a right row-key design you will never have empty regions due to
>> TTL.
>>
>> I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to
>> write up a post for the blog? Meanwhile, I'm sure of a couple of us on here
>> on the list would appreciate your Cliff's Notes version. I can take this
>> into account for my v2 schema design.
>>
>> > So Nick, merge on 1.1 is not recommended??? Was working very well on
>> > previous versions. Is ProcV2 really impact it that bad??
>>
>> How to answer here carefully... I have no reason to believe merge is not
>> working on 1.1. I've been on the wrong end of enough "regions stuck in
>> transition" support tickets that I'm not keen to put undue stress on my
>> master. ProcV2 insures against many scenarios that cause master trauma,
>> hence my interest in the implementation details and my preference for
>> cluster administration tasks that use it as their source of authority.
>>
>> Thanks for the thoughts folks.
>> -n
>>
>> On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari <
>> jean-marc@spaggiari.org> wrote:
>>
>>> ;) That was not the question ;)
>>>
>>> So Nick, merge on 1.1 is not recommended??? Was working very well on
>>> previous versions. Is ProcV2 really impact it that bad??
>>>
>>> JMS
>>>
>>> 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:
>>>
>>> > >> This is something
>>> > >> which makes it far less useful for time-series databases with short
>>> TTL
>>> > on
>>> > >> the tables.
>>> >
>>> > With a right row-key design you will never have empty regions due to
>>> TTL.
>>> >
>>> > -Vlad
>>> >
>>> > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov <
>>> olorinbant@gmail.com>
>>> > wrote:
>>> >
>>> > > Crazy idea, but you might be able to take stripped down version of
>>> region
>>> > > normalizer code and make a Tool to run? Requesting split or merge is
>>> done
>>> > > through the client API, and the only weighing information you need is
>>> > > whether region empty or not, that you could find out too?
>>> > >
>>> > >
>>> > > "Short of upgrading to 1.2 for the region normalizer,"
>>> > >
>>> > > A bit off topic, but I think unfortunately region normalizer now
>>> ignores
>>> > > empty regions to avoid undoing pre-split on the table. This is
>>> something
>>> > > which makes it far less useful for time-series databases with short
>>> TTL
>>> > on
>>> > > the tables. We'll need to address that.
>>> > >
>>> > > -Mikhail
>>> > >
>>> > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <nd...@gmail.com>
>>> > wrote:
>>> > >
>>> > > > Hi folks,
>>> > > >
>>> > > > I have a table with TTL enabled. It's been receiving data for a
>>> while
>>> > > > beyond the TTL and I now have a number of empty regions. I'd like
>>> to
>>> > drop
>>> > > > those empty regions to free up heap space on the region servers and
>>> > > reduce
>>> > > > master load. I'm running a 1.1 derivative.
>>> > > >
>>> > > > The only threads I found on this topic are from circa 0.92
>>> timeframe.
>>> > > >
>>> > > > Short of upgrading to 1.2 for the region normalizer, what's the
>>> > > recommended
>>> > > > method of cleaning up this cruft? Should I be merging empty regions
>>> > into
>>> > > > their neighbor's? Looks like region merge hasn't been migrated to
>>> > ProcV2
>>> > > > yet so would be wise to reduce online table activity, or at least
>>> aim
>>> > > for a
>>> > > > "quiet period"? Is there a documented process for off-lining and
>>> > > deleting a
>>> > > > region by name? I don't see anything in the book about it.
>>> > > >
>>> > > > I experimented with online merge on pseudodist, looks like it's
>>> working
>>> > > > fine for the most basic case. I'll probably pursue this unless
>>> someone
>>> > > has
>>> > > > some other ideas.
>>> > > >
>>> > > > Thanks,
>>> > > > Nick
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Thanks,
>>> > > Michael Antonov
>>> > >
>>> >
>>>
>>
>>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Retiring empty regions

Posted by Mikhail Antonov <ol...@gmail.com>.

Yeah, that sound interesting.

Do you think it should be a script (command, runnable from the client
side), or some chore on master?

You're going on this route because region normalizer lacks features
you guys need?

-Mikhail

> Circling back here and adding user@phoenix. I put together one script to dump region info from the shell and find the empty ones, another to merge a given region into a neighbor. We've run them without incident, looks like it all works fine. One thing we did notice is that the AM leaves the old "retired" regions around in its counts -- the master status page shows a large number of "Other Regions". This was alarming at first, but we verified it's just an artifact in the AM and in fact these regions are not on HDFS or in meta. Bouncing master resolved it. No one has volunteered any alternative schema designs, so as best we know, this will happen to anyone who has timestamp in their rowkey (ie, anyone using Phoenix's "Row timestamp" feature [0]) and is also using the TTL feature. Are folks interested in adding these scripts to our distribution and our book? -n [0]: https://phoenix.apache.org/rowtimestamp.html

Re: Retiring empty regions

Posted by Nick Dimiduk <nd...@gmail.com>.

Circling back here and adding user@phoenix.

I put together one script to dump region info from the shell and find the
empty ones, another to merge a given region into a neighbor. We've run them
without incident, looks like it all works fine. One thing we did notice is
that the AM leaves the old "retired" regions around in its counts -- the
master status page shows a large number of "Other Regions". This was
alarming at first, but we verified it's just an artifact in the AM and in
fact these regions are not on HDFS or in meta. Bouncing master resolved it.

No one has volunteered any alternative schema designs, so as best we know,
this will happen to anyone who has timestamp in their rowkey (ie, anyone
using Phoenix's "Row timestamp" feature [0]) and is also using the TTL
feature. Are folks interested in adding these scripts to our distribution
and our book?

-n

[0]: https://phoenix.apache.org/rowtimestamp.html

On Mon, Apr 4, 2016 at 8:34 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> > Crazy idea, but you might be able to take stripped down version of
> region
> > normalizer code and make a Tool to run? Requesting split or merge is done
> > through the client API, and the only weighing information you need is
> > whether region empty or not, that you could find out too?
>
> Yeah, that's the direction I'm headed.
>
> > A bit off topic, but I think unfortunately region normalizer now ignores
> > empty regions to avoid undoing pre-split on the table.
>
> Unfortunate indeed. Maybe we should be keeping around the initial splits
> list as a metadata attribute on the table?
>
> > With a right row-key design you will never have empty regions due to TTL.
>
> I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to
> write up a post for the blog? Meanwhile, I'm sure of a couple of us on here
> on the list would appreciate your Cliff's Notes version. I can take this
> into account for my v2 schema design.
>
> > So Nick, merge on 1.1 is not recommended??? Was working very well on
> > previous versions. Is ProcV2 really impact it that bad??
>
> How to answer here carefully... I have no reason to believe merge is not
> working on 1.1. I've been on the wrong end of enough "regions stuck in
> transition" support tickets that I'm not keen to put undue stress on my
> master. ProcV2 insures against many scenarios that cause master trauma,
> hence my interest in the implementation details and my preference for
> cluster administration tasks that use it as their source of authority.
>
> Thanks for the thoughts folks.
> -n
>
> On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> ;) That was not the question ;)
>>
>> So Nick, merge on 1.1 is not recommended??? Was working very well on
>> previous versions. Is ProcV2 really impact it that bad??
>>
>> JMS
>>
>> 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:
>>
>> > >> This is something
>> > >> which makes it far less useful for time-series databases with short
>> TTL
>> > on
>> > >> the tables.
>> >
>> > With a right row-key design you will never have empty regions due to
>> TTL.
>> >
>> > -Vlad
>> >
>> > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov <olorinbant@gmail.com
>> >
>> > wrote:
>> >
>> > > Crazy idea, but you might be able to take stripped down version of
>> region
>> > > normalizer code and make a Tool to run? Requesting split or merge is
>> done
>> > > through the client API, and the only weighing information you need is
>> > > whether region empty or not, that you could find out too?
>> > >
>> > >
>> > > "Short of upgrading to 1.2 for the region normalizer,"
>> > >
>> > > A bit off topic, but I think unfortunately region normalizer now
>> ignores
>> > > empty regions to avoid undoing pre-split on the table. This is
>> something
>> > > which makes it far less useful for time-series databases with short
>> TTL
>> > on
>> > > the tables. We'll need to address that.
>> > >
>> > > -Mikhail
>> > >
>> > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <nd...@gmail.com>
>> > wrote:
>> > >
>> > > > Hi folks,
>> > > >
>> > > > I have a table with TTL enabled. It's been receiving data for a
>> while
>> > > > beyond the TTL and I now have a number of empty regions. I'd like to
>> > drop
>> > > > those empty regions to free up heap space on the region servers and
>> > > reduce
>> > > > master load. I'm running a 1.1 derivative.
>> > > >
>> > > > The only threads I found on this topic are from circa 0.92
>> timeframe.
>> > > >
>> > > > Short of upgrading to 1.2 for the region normalizer, what's the
>> > > recommended
>> > > > method of cleaning up this cruft? Should I be merging empty regions
>> > into
>> > > > their neighbor's? Looks like region merge hasn't been migrated to
>> > ProcV2
>> > > > yet so would be wise to reduce online table activity, or at least
>> aim
>> > > for a
>> > > > "quiet period"? Is there a documented process for off-lining and
>> > > deleting a
>> > > > region by name? I don't see anything in the book about it.
>> > > >
>> > > > I experimented with online merge on pseudodist, looks like it's
>> working
>> > > > fine for the most basic case. I'll probably pursue this unless
>> someone
>> > > has
>> > > > some other ideas.
>> > > >
>> > > > Thanks,
>> > > > Nick
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Thanks,
>> > > Michael Antonov
>> > >
>> >
>>
>
>

Re: Retiring empty regions

Posted by Nick Dimiduk <nd...@gmail.com>.

Circling back here and adding user@phoenix.

I put together one script to dump region info from the shell and find the
empty ones, another to merge a given region into a neighbor. We've run them
without incident, looks like it all works fine. One thing we did notice is
that the AM leaves the old "retired" regions around in its counts -- the
master status page shows a large number of "Other Regions". This was
alarming at first, but we verified it's just an artifact in the AM and in
fact these regions are not on HDFS or in meta. Bouncing master resolved it.

No one has volunteered any alternative schema designs, so as best we know,
this will happen to anyone who has timestamp in their rowkey (ie, anyone
using Phoenix's "Row timestamp" feature [0]) and is also using the TTL
feature. Are folks interested in adding these scripts to our distribution
and our book?

-n

[0]: https://phoenix.apache.org/rowtimestamp.html

On Mon, Apr 4, 2016 at 8:34 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> > Crazy idea, but you might be able to take stripped down version of
> region
> > normalizer code and make a Tool to run? Requesting split or merge is done
> > through the client API, and the only weighing information you need is
> > whether region empty or not, that you could find out too?
>
> Yeah, that's the direction I'm headed.
>
> > A bit off topic, but I think unfortunately region normalizer now ignores
> > empty regions to avoid undoing pre-split on the table.
>
> Unfortunate indeed. Maybe we should be keeping around the initial splits
> list as a metadata attribute on the table?
>
> > With a right row-key design you will never have empty regions due to TTL.
>
> I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to
> write up a post for the blog? Meanwhile, I'm sure of a couple of us on here
> on the list would appreciate your Cliff's Notes version. I can take this
> into account for my v2 schema design.
>
> > So Nick, merge on 1.1 is not recommended??? Was working very well on
> > previous versions. Is ProcV2 really impact it that bad??
>
> How to answer here carefully... I have no reason to believe merge is not
> working on 1.1. I've been on the wrong end of enough "regions stuck in
> transition" support tickets that I'm not keen to put undue stress on my
> master. ProcV2 insures against many scenarios that cause master trauma,
> hence my interest in the implementation details and my preference for
> cluster administration tasks that use it as their source of authority.
>
> Thanks for the thoughts folks.
> -n
>
> On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> ;) That was not the question ;)
>>
>> So Nick, merge on 1.1 is not recommended??? Was working very well on
>> previous versions. Is ProcV2 really impact it that bad??
>>
>> JMS
>>
>> 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:
>>
>> > >> This is something
>> > >> which makes it far less useful for time-series databases with short
>> TTL
>> > on
>> > >> the tables.
>> >
>> > With a right row-key design you will never have empty regions due to
>> TTL.
>> >
>> > -Vlad
>> >
>> > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov <olorinbant@gmail.com
>> >
>> > wrote:
>> >
>> > > Crazy idea, but you might be able to take stripped down version of
>> region
>> > > normalizer code and make a Tool to run? Requesting split or merge is
>> done
>> > > through the client API, and the only weighing information you need is
>> > > whether region empty or not, that you could find out too?
>> > >
>> > >
>> > > "Short of upgrading to 1.2 for the region normalizer,"
>> > >
>> > > A bit off topic, but I think unfortunately region normalizer now
>> ignores
>> > > empty regions to avoid undoing pre-split on the table. This is
>> something
>> > > which makes it far less useful for time-series databases with short
>> TTL
>> > on
>> > > the tables. We'll need to address that.
>> > >
>> > > -Mikhail
>> > >
>> > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <nd...@gmail.com>
>> > wrote:
>> > >
>> > > > Hi folks,
>> > > >
>> > > > I have a table with TTL enabled. It's been receiving data for a
>> while
>> > > > beyond the TTL and I now have a number of empty regions. I'd like to
>> > drop
>> > > > those empty regions to free up heap space on the region servers and
>> > > reduce
>> > > > master load. I'm running a 1.1 derivative.
>> > > >
>> > > > The only threads I found on this topic are from circa 0.92
>> timeframe.
>> > > >
>> > > > Short of upgrading to 1.2 for the region normalizer, what's the
>> > > recommended
>> > > > method of cleaning up this cruft? Should I be merging empty regions
>> > into
>> > > > their neighbor's? Looks like region merge hasn't been migrated to
>> > ProcV2
>> > > > yet so would be wise to reduce online table activity, or at least
>> aim
>> > > for a
>> > > > "quiet period"? Is there a documented process for off-lining and
>> > > deleting a
>> > > > region by name? I don't see anything in the book about it.
>> > > >
>> > > > I experimented with online merge on pseudodist, looks like it's
>> working
>> > > > fine for the most basic case. I'll probably pursue this unless
>> someone
>> > > has
>> > > > some other ideas.
>> > > >
>> > > > Thanks,
>> > > > Nick
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Thanks,
>> > > Michael Antonov
>> > >
>> >
>>
>
>

Re: Retiring empty regions

Posted by Nick Dimiduk <nd...@gmail.com>.

> Crazy idea, but you might be able to take stripped down version of region
> normalizer code and make a Tool to run? Requesting split or merge is done
> through the client API, and the only weighing information you need is
> whether region empty or not, that you could find out too?

Yeah, that's the direction I'm headed.

> A bit off topic, but I think unfortunately region normalizer now ignores
> empty regions to avoid undoing pre-split on the table.

Unfortunate indeed. Maybe we should be keeping around the initial splits
list as a metadata attribute on the table?

> With a right row-key design you will never have empty regions due to TTL.

I'd love to hear your thoughts on this design, Vlad. Maybe you'd like to
write up a post for the blog? Meanwhile, I'm sure of a couple of us on here
on the list would appreciate your Cliff's Notes version. I can take this
into account for my v2 schema design.

> So Nick, merge on 1.1 is not recommended??? Was working very well on
> previous versions. Is ProcV2 really impact it that bad??

How to answer here carefully... I have no reason to believe merge is not
working on 1.1. I've been on the wrong end of enough "regions stuck in
transition" support tickets that I'm not keen to put undue stress on my
master. ProcV2 insures against many scenarios that cause master trauma,
hence my interest in the implementation details and my preference for
cluster administration tasks that use it as their source of authority.

Thanks for the thoughts folks.
-n

On Fri, Apr 1, 2016 at 10:52 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> ;) That was not the question ;)
>
> So Nick, merge on 1.1 is not recommended??? Was working very well on
> previous versions. Is ProcV2 really impact it that bad??
>
> JMS
>
> 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:
>
> > >> This is something
> > >> which makes it far less useful for time-series databases with short
> TTL
> > on
> > >> the tables.
> >
> > With a right row-key design you will never have empty regions due to TTL.
> >
> > -Vlad
> >
> > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov <ol...@gmail.com>
> > wrote:
> >
> > > Crazy idea, but you might be able to take stripped down version of
> region
> > > normalizer code and make a Tool to run? Requesting split or merge is
> done
> > > through the client API, and the only weighing information you need is
> > > whether region empty or not, that you could find out too?
> > >
> > >
> > > "Short of upgrading to 1.2 for the region normalizer,"
> > >
> > > A bit off topic, but I think unfortunately region normalizer now
> ignores
> > > empty regions to avoid undoing pre-split on the table. This is
> something
> > > which makes it far less useful for time-series databases with short TTL
> > on
> > > the tables. We'll need to address that.
> > >
> > > -Mikhail
> > >
> > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <nd...@gmail.com>
> > wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I have a table with TTL enabled. It's been receiving data for a while
> > > > beyond the TTL and I now have a number of empty regions. I'd like to
> > drop
> > > > those empty regions to free up heap space on the region servers and
> > > reduce
> > > > master load. I'm running a 1.1 derivative.
> > > >
> > > > The only threads I found on this topic are from circa 0.92 timeframe.
> > > >
> > > > Short of upgrading to 1.2 for the region normalizer, what's the
> > > recommended
> > > > method of cleaning up this cruft? Should I be merging empty regions
> > into
> > > > their neighbor's? Looks like region merge hasn't been migrated to
> > ProcV2
> > > > yet so would be wise to reduce online table activity, or at least aim
> > > for a
> > > > "quiet period"? Is there a documented process for off-lining and
> > > deleting a
> > > > region by name? I don't see anything in the book about it.
> > > >
> > > > I experimented with online merge on pseudodist, looks like it's
> working
> > > > fine for the most basic case. I'll probably pursue this unless
> someone
> > > has
> > > > some other ideas.
> > > >
> > > > Thanks,
> > > > Nick
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Michael Antonov
> > >
> >
>

Re: Retiring empty regions

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

;) That was not the question ;)

So Nick, merge on 1.1 is not recommended??? Was working very well on
previous versions. Is ProcV2 really impact it that bad??

JMS

2016-04-01 13:49 GMT-04:00 Vladimir Rodionov <vl...@gmail.com>:

> >> This is something
> >> which makes it far less useful for time-series databases with short TTL
> on
> >> the tables.
>
> With a right row-key design you will never have empty regions due to TTL.
>
> -Vlad
>
> On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov <ol...@gmail.com>
> wrote:
>
> > Crazy idea, but you might be able to take stripped down version of region
> > normalizer code and make a Tool to run? Requesting split or merge is done
> > through the client API, and the only weighing information you need is
> > whether region empty or not, that you could find out too?
> >
> >
> > "Short of upgrading to 1.2 for the region normalizer,"
> >
> > A bit off topic, but I think unfortunately region normalizer now ignores
> > empty regions to avoid undoing pre-split on the table. This is something
> > which makes it far less useful for time-series databases with short TTL
> on
> > the tables. We'll need to address that.
> >
> > -Mikhail
> >
> > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <nd...@gmail.com>
> wrote:
> >
> > > Hi folks,
> > >
> > > I have a table with TTL enabled. It's been receiving data for a while
> > > beyond the TTL and I now have a number of empty regions. I'd like to
> drop
> > > those empty regions to free up heap space on the region servers and
> > reduce
> > > master load. I'm running a 1.1 derivative.
> > >
> > > The only threads I found on this topic are from circa 0.92 timeframe.
> > >
> > > Short of upgrading to 1.2 for the region normalizer, what's the
> > recommended
> > > method of cleaning up this cruft? Should I be merging empty regions
> into
> > > their neighbor's? Looks like region merge hasn't been migrated to
> ProcV2
> > > yet so would be wise to reduce online table activity, or at least aim
> > for a
> > > "quiet period"? Is there a documented process for off-lining and
> > deleting a
> > > region by name? I don't see anything in the book about it.
> > >
> > > I experimented with online merge on pseudodist, looks like it's working
> > > fine for the most basic case. I'll probably pursue this unless someone
> > has
> > > some other ideas.
> > >
> > > Thanks,
> > > Nick
> > >
> >
> >
> >
> > --
> > Thanks,
> > Michael Antonov
> >
>

Re: Retiring empty regions

Posted by Vladimir Rodionov <vl...@gmail.com>.

>> This is something
>> which makes it far less useful for time-series databases with short TTL
on
>> the tables.

With a right row-key design you will never have empty regions due to TTL.

-Vlad

On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov <ol...@gmail.com>
wrote:

> Crazy idea, but you might be able to take stripped down version of region
> normalizer code and make a Tool to run? Requesting split or merge is done
> through the client API, and the only weighing information you need is
> whether region empty or not, that you could find out too?
>
>
> "Short of upgrading to 1.2 for the region normalizer,"
>
> A bit off topic, but I think unfortunately region normalizer now ignores
> empty regions to avoid undoing pre-split on the table. This is something
> which makes it far less useful for time-series databases with short TTL on
> the tables. We'll need to address that.
>
> -Mikhail
>
> On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <nd...@gmail.com> wrote:
>
> > Hi folks,
> >
> > I have a table with TTL enabled. It's been receiving data for a while
> > beyond the TTL and I now have a number of empty regions. I'd like to drop
> > those empty regions to free up heap space on the region servers and
> reduce
> > master load. I'm running a 1.1 derivative.
> >
> > The only threads I found on this topic are from circa 0.92 timeframe.
> >
> > Short of upgrading to 1.2 for the region normalizer, what's the
> recommended
> > method of cleaning up this cruft? Should I be merging empty regions into
> > their neighbor's? Looks like region merge hasn't been migrated to ProcV2
> > yet so would be wise to reduce online table activity, or at least aim
> for a
> > "quiet period"? Is there a documented process for off-lining and
> deleting a
> > region by name? I don't see anything in the book about it.
> >
> > I experimented with online merge on pseudodist, looks like it's working
> > fine for the most basic case. I'll probably pursue this unless someone
> has
> > some other ideas.
> >
> > Thanks,
> > Nick
> >
>
>
>
> --
> Thanks,
> Michael Antonov
>

Re: Retiring empty regions

Posted by Mikhail Antonov <ol...@gmail.com>.

Crazy idea, but you might be able to take stripped down version of region
normalizer code and make a Tool to run? Requesting split or merge is done
through the client API, and the only weighing information you need is
whether region empty or not, that you could find out too?

"Short of upgrading to 1.2 for the region normalizer,"

A bit off topic, but I think unfortunately region normalizer now ignores
empty regions to avoid undoing pre-split on the table. This is something
which makes it far less useful for time-series databases with short TTL on
the tables. We'll need to address that.

-Mikhail

On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> Hi folks,
>
> I have a table with TTL enabled. It's been receiving data for a while
> beyond the TTL and I now have a number of empty regions. I'd like to drop
> those empty regions to free up heap space on the region servers and reduce
> master load. I'm running a 1.1 derivative.
>
> The only threads I found on this topic are from circa 0.92 timeframe.
>
> Short of upgrading to 1.2 for the region normalizer, what's the recommended
> method of cleaning up this cruft? Should I be merging empty regions into
> their neighbor's? Looks like region merge hasn't been migrated to ProcV2
> yet so would be wise to reduce online table activity, or at least aim for a
> "quiet period"? Is there a documented process for off-lining and deleting a
> region by name? I don't see anything in the book about it.
>
> I experimented with online merge on pseudodist, looks like it's working
> fine for the most basic case. I'll probably pursue this unless someone has
> some other ideas.
>
> Thanks,
> Nick
>

-- 
Thanks,
Michael Antonov