You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Sean Busbey <bu...@apache.org> on 2019/01/18 03:47:19 UTC

[DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

There are a few related topics I'd like to discuss and I figured this
subject line is the most likely to get a bit of attention. :)

First, I'd like us all to get on the same page wrt the current state
of branch-2. Personally, I don't think it can be released as-is with a
2.y version because folks can't rolling upgrade from 2.0 or 2.1 to it
due to the current implementation of HBASE-20881. As Duo has mentioned
a couple of times, folks have to ensure there are no region
transitions around during the upgrade. I think that will be
prohibitive for folks looking to upgrade. What do other folks think?

Second, I think our recent discussions around the need for shifting to
more minor releases for HBase 1.y also applies to the 2.y branches.
branch-2 hasn't had a release since 2.1.0 came out in July 2018.
That's a scary long amount of time. I think it contributes to us
ending up with changes like the above since it's easy to think about
the branch as something that has a lot of time before the next
release.

Personally, I'd like to see us skip making minor-release specific
branches for a bit unless a CVE fix or something comes up. Ideally,
that would mean we work towards a 2.2.0 release directly from branch-2
and then 2.2.1, etc. When we have a feature that's ready to backport
from the master branch for a release we then update branch-2's version
to be 2.3.0.

Or maybe we try set a regular cadence to feature releases by having
branch-2 release a new minor, two months of new maintenance releases,
followed by a new minor. That would mean after the last of the
maintenance releases we'd have a window of a few weeks where we can
all decide which features in master are mature enough to backport for
the new minor release.

Lastly, what would it take for folks to feel confident moving the
'stable' pointer to a HBase 2.y? Is there a major gap still on
assignment stability? Is it a more thorough look at performance? More
time to ensure HBCK2 has good coverage of failure modes that need it?

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
https://issues.apache.org/jira/browse/HBASE-21745

张铎(Duo Zhang) <pa...@gmail.com> 于2019年1月19日周六 上午9:51写道:

> OK, the original issue is HBCK2 for AMv2, but here we need to do more, not
> only for AMv2.
>
> Let me open a new issue and post what Andrew said above there.
>
> 张铎(Duo Zhang) <pa...@gmail.com> 于2019年1月19日周六 上午9:26写道:
>
>> OK, let me find the original HBCK2 issue and see how can we make progress
>> on it.
>>
>> BTW, on scan performance, Zheng Hu has done a work to get about 40%
>> performance back in this issue for 100% scan case on ycsb
>>
>> https://issues.apache.org/jira/browse/HBASE-21657
>>
>> Andrew Purtell <ap...@apache.org> 于2019年1月19日周六 上午8:14写道:
>>
>>> Lars was testing tip of branch-2 with Phoenix and said scans were 50%
>>> slower than branch-1. I’ll try and get him to provide more details.
>>> Anyway
>>> after hbck2 is complete issues like that will come out in the testing
>>> we’d
>>> do as part of sanity checking a move of the pointer.
>>>
>>> On Fri, Jan 18, 2019 at 4:02 PM Zach York <zy...@gmail.com>
>>> wrote:
>>>
>>> > I agree with the sentiment around HBCK2. I think these kind of recovery
>>> > tools are essential before marking something stable.
>>> >
>>> > I also remember when we did testing around HBase 2.x/2.1 that we were
>>> > getting perf degradations and couldn't seem to get performance to be as
>>> > good as we were getting in the 1.x line.
>>> >
>>> > - Zach
>>> >
>>> > On Thu, Jan 17, 2019 at 11:06 PM Pankaj kr <pa...@huawei.com>
>>> wrote:
>>> >
>>> > > Yeah, HBCK2/ OfflineMetaRepair tools are really required to migrate
>>> old
>>> > > version data to HBase-2. We have use cases where we are using these
>>> tools
>>> > > to rebuild the meta for further region assignment.
>>> > > Similar discussion is going on HBASE-21665, after fixing the NPE and
>>> > > rebuilding the meta, master don't assign the regions as we skip the
>>> empty
>>> > > regions while loading meta during master startup.
>>> > >
>>> > > A big +1 from my side on this...
>>> > >
>>> > > Regards,
>>> > > Pankaj
>>> > >
>>> > > -----Original Message-----
>>> > > From: 张铎(Duo Zhang) [mailto:palomino219@gmail.com]
>>> > > Sent: 18 January 2019 11:55
>>> > > To: HBase Dev List <de...@hbase.apache.org>
>>> > > Subject: Re: [DISCUSS] Moving towards a branch-2 line that can get
>>> the
>>> > > 'stable' pointer.
>>> > >
>>> > > So the first priority is to make progress on HBCK2? If we all agree,
>>> > let's
>>> > > start to work.
>>> > >
>>> > > Andrew Purtell <ap...@apache.org> 于2019年1月18日周五 下午12:31写道:
>>> > >
>>> > > > Sorry, let me add... Check all the boxes on that list and I'm +1
>>> for
>>> > > > moving the stable pointer (modulo some time to pound on the
>>> candidate
>>> > > > to really put it through its paces, like two weeks of chaos...)
>>> > > >
>>> > > > On Thu, Jan 17, 2019 at 8:28 PM Andrew Purtell <
>>> apurtell@apache.org>
>>> > > > wrote:
>>> > > >
>>> > > > > I do not believe we should move the stable pointer to any 2.x
>>> until
>>> > > > > HBCK2 is feature complete. We can discuss what that milestone
>>> should
>>> > > look like.
>>> > > > > At a minimum, I think we need:
>>> > > > >
>>> > > > >    - Rebuild meta from region metadata in the filesystem, aka
>>> offline
>>> > > > >    meta rebuild.
>>> > > > >    - Fix assignment errors (undeployed regions, double
>>> assignments
>>> > > (yes,
>>> > > > >    should not be possible), etc)
>>> > > > >    - Fix region holes, overlaps, and other errors in the region
>>> chain
>>> > > > >    - Fix failed split and merge transactions that have failed to
>>> roll
>>> > > > >    back due to some bug (related to previous)
>>> > > > >    - Enumerate store files to determine file level corruption and
>>> > > > >    sideline corrupt files
>>> > > > >    - Fix hfile link problems (dangling / broken)
>>> > > > >
>>> > > > > This is a list of the real problems I have had to fix in
>>> production
>>> > > > > at least once (in the past 10 years...).
>>> > > > >
>>> > > > > On Thu, Jan 17, 2019 at 8:19 PM 张铎(Duo Zhang)
>>> > > > > <pa...@gmail.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > >> There are still lots of small new features which we want to
>>> > > > >> integrate
>>> > > > into
>>> > > > >> branch-2 so I'm -1 on making release directly from branch-2.
>>> > > > >> Backporting at once before release is a pain I'd say, I've tried
>>> > > > >> this many times recently, as we have to follow up the community
>>> > > > >> version...Let's make a branch-2.2 when we want to release 2.2.0,
>>> > > > >> and maybe also retire the branch-2.0?
>>> > > > >>
>>> > > > >> For the stable pointer, I think 2.1.x maybe a good candidate?
>>> > > > >> Though we know that we may still have some bugs for the AMv2,
>>> but
>>> > > > >> actually we all know that the AMv1 for all the branch-1.x also
>>> has
>>> > > > >> lots of bugs, that's why hbck is very important.
>>> > > > >>
>>> > > > >> And also +! on making progress on HBCK2, we need to port he
>>> useful
>>> > > > >> features of HBCK1 to HBCK2. There is no software can guarantee
>>> that
>>> > > > >> there is no bug, so FWIW we should have a way to fix broken
>>> > > > >> clusters.
>>> > > > >>
>>> > > > >> Sean Busbey <bu...@apache.org> 于2019年1月18日周五 上午11:47写道:
>>> > > > >>
>>> > > > >> > There are a few related topics I'd like to discuss and I
>>> figured
>>> > > > >> > this subject line is the most likely to get a bit of
>>> attention.
>>> > > > >> > :)
>>> > > > >> >
>>> > > > >> > First, I'd like us all to get on the same page wrt the current
>>> > > > >> > state of branch-2. Personally, I don't think it can be
>>> released
>>> > > > >> > as-is with a 2.y version because folks can't rolling upgrade
>>> from
>>> > > > >> > 2.0 or 2.1 to it due to the current implementation of
>>> > > > >> > HBASE-20881. As Duo has mentioned a couple of times, folks
>>> have
>>> > > > >> > to ensure there are no region transitions around during the
>>> > > > >> > upgrade. I think that will be prohibitive for folks looking to
>>> > > upgrade. What do other folks think?
>>> > > > >> >
>>> > > > >> > Second, I think our recent discussions around the need for
>>> > > > >> > shifting to more minor releases for HBase 1.y also applies to
>>> the
>>> > > 2.y branches.
>>> > > > >> > branch-2 hasn't had a release since 2.1.0 came out in July
>>> 2018.
>>> > > > >> > That's a scary long amount of time. I think it contributes to
>>> us
>>> > > > >> > ending up with changes like the above since it's easy to think
>>> > > > >> > about the branch as something that has a lot of time before
>>> the
>>> > > > >> > next release.
>>> > > > >> >
>>> > > > >> > Personally, I'd like to see us skip making minor-release
>>> specific
>>> > > > >> > branches for a bit unless a CVE fix or something comes up.
>>> > > > >> > Ideally, that would mean we work towards a 2.2.0 release
>>> directly
>>> > > > >> > from branch-2 and then 2.2.1, etc. When we have a feature
>>> that's
>>> > > > >> > ready to backport from the master branch for a release we then
>>> > > > >> > update branch-2's version to be 2.3.0.
>>> > > > >> >
>>> > > > >> > Or maybe we try set a regular cadence to feature releases by
>>> > > > >> > having
>>> > > > >> > branch-2 release a new minor, two months of new maintenance
>>> > > > >> > releases, followed by a new minor. That would mean after the
>>> last
>>> > > > >> > of the maintenance releases we'd have a window of a few weeks
>>> > > > >> > where we can all decide which features in master are mature
>>> > > > >> > enough to backport for the new minor release.
>>> > > > >> >
>>> > > > >> > Lastly, what would it take for folks to feel confident moving
>>> the
>>> > > > >> > 'stable' pointer to a HBase 2.y? Is there a major gap still on
>>> > > > >> > assignment stability? Is it a more thorough look at
>>> performance?
>>> > > > >> > More time to ensure HBCK2 has good coverage of failure modes
>>> that
>>> > > need it?
>>> > > > >> >
>>> > > > >>
>>> > > > >
>>> > > > >
>>> > > > > --
>>> > > > > Best regards,
>>> > > > > Andrew
>>> > > > >
>>> > > > > Words like orphans lost among the crosstalk, meaning torn from
>>> > > > > truth's decrepit hands
>>> > > > >    - A23, Crosstalk
>>> > > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > Best regards,
>>> > > > Andrew
>>> > > >
>>> > > > Words like orphans lost among the crosstalk, meaning torn from
>>> truth's
>>> > > > decrepit hands
>>> > > >    - A23, Crosstalk
>>> > > >
>>> > >
>>> >
>>> --
>>> Best regards,
>>> Andrew
>>>
>>> Words like orphans lost among the crosstalk, meaning torn from truth's
>>> decrepit hands
>>>    - A23, Crosstalk
>>>
>>

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by Andrew Purtell <ap...@apache.org>.
+1
Earlier this came up as a need for cancelling individual procedures. The
capability to cancel a procedure was removed from the procedure framework
because it is hard to do reliably in all cases. It should at least be
possible to cancel and remove procedures that have not executed yet but are
in queue / are stuck. Otherwise one has to resort to removing the proc WAL
to drain the queue, losing needed work as well as the troublesome entries.
It could also be reasonable to allow abort requests, which might be
rejected depending on implementation, and then allow abort requests in some
limited cases (hopefully, an increasing number over time).


On Fri, Jan 25, 2019 at 1:40 PM Sergey Shelukhin
<Se...@microsoft.com.invalid> wrote:

> I think one thing that is needed for HBCK2 for AMv2 is to be able to
> delete single procedures from store.
> We are evaluating master (whose assignment is very similar to branch-2)
> right now and I have to delete proc WAL pretty much every day because some
> procedure(s) are in bad state, but deleting the entire WAL also causes
> other issues.
> It should be possible to remove some offending procedure while master is
> offline and/or online.
>
> -----Original Message-----
> From: 张铎(Duo Zhang) <pa...@gmail.com>
> Sent: Friday, January 18, 2019 5:52 PM
> To: HBase Dev List <de...@hbase.apache.org>
> Subject: Re: [DISCUSS] Moving towards a branch-2 line that can get the
> 'stable' pointer.
>
> OK, the original issue is HBCK2 for AMv2, but here we need to do more, not
> only for AMv2.
>
> Let me open a new issue and post what Andrew said above there.
>
>

-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
We have made some progress, especially on HBCK2.

And we plan to cut 2.2.2 after resolving HBSE-23079, maybe this could be
the stable pointer candidate.

But anyway, I think we should have a good documentation on how to make use
HBCK2.

Not sure if this one is up to date...

https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2

OpenInx <op...@gmail.com> 于2019年9月29日周日 下午7:36写道:

> > Let's gather a few stories on it working for folks in production and move
> the pointer then.
> Agree,  XiaoMi is making parts of the online clusters upgrade to
> HBase2.2.x, I think
> Guanghao will share the practices some time later.
> Thanks.
>
> On Sun, Sep 15, 2019 at 12:09 AM Stack <st...@duboce.net> wrote:
>
> > On Fri, Sep 13, 2019 at 4:58 PM Andrew Purtell <andrew.purtell@gmail.com
> >
> > wrote:
> >
> > > For what it’s worth I had previously been concerned about the disparity
> > > between hbck capability in 1.x and 2.x but after review of the recent
> > work
> > > I believe that is no longer true. Put another way, it is reasonable to
> > > claim it on par.
> > >
> > >
> > Thanks Andrew for chiming in.
> >
> >
> >
> > > As for moving the stable pointer I don’t personally have enough
> > experience
> > > with HBase 2 to weigh in but will trust the opinions of those that do.
> > >
> > >
> > >
> > Let's gather a few stories on it working for folks in production and move
> > the pointer then.
> >
> > Thanks,
> > S
> >
> >
> >
> > > > On Sep 14, 2019, at 8:44 AM, Stack <st...@duboce.net> wrote:
> > > >
> > > > HBASE-21745 <https://issues.apache.org/jira/browse/HBASE-21745>, the
> > > issue
> > > > addressing gaps between hbck1 and hbck2 was closed a few days back
> > after
> > > a
> > > > bunch of work by a kaleidoscope of folks. The release notes section
> > tries
> > > > to describe what was added by HBASE-21745. Shout if you think the
> claim
> > > at
> > > > the end of the release notes section that hbck2 now is on par or
> beyond
> > > > what hbck1 offered is problematic. Otherwise, will proceed as though
> it
> > > is
> > > > the case.
> > > >
> > > > Suggestion: Given that hbase 2.2.1 will ship soon and
> > > hbase-operator-tools
> > > > 1.0.0 with latest hbase-hbck2 should get an RC inside the next week
> or
> > > so,
> > > > if feedback that 2.2.1 looks good, give 2.2.2 (with bug fixes only)
> the
> > > > stable pointer?
> > > >
> > > > Thanks,
> > > > S
> > > >
> > > >> On Sat, Jan 26, 2019 at 11:31 AM Stack <st...@duboce.net> wrote:
> > > >>
> > > >> As per Sean, bypass with optional 'force' (override) and recurse for
> > > case
> > > >> where a procedure had spawned children was the mechanism Allan
> > > implemented
> > > >> after a chat about merits of procedure delete. I found it of use
> doing
> > > >> fixup to clusters I'd intentionally damaged testing candidates.
> > > Procedures
> > > >> are usually part of a fabric with relations that an operator might
> > have
> > > >> trouble unraveling. It was thought that the bypass would be safer
> > than a
> > > >> delete, likely to cause more damage than solution.
> > > >>
> > > >> Interested in the issues you are seeing on Master branch Sergey.
> > > >>
> > > >> Thanks,
> > > >> S
> > > >>
> > > >>> On Fri, Jan 25, 2019 at 1:54 PM Sean Busbey <bu...@apache.org>
> > wrote:
> > > >>>
> > > >>> that's already present, see the README for the "bypass" command:
> > > >>>
> > > >>>
> > https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
> > > >>>
> > > >>> On Fri, Jan 25, 2019 at 3:40 PM Sergey Shelukhin
> > > >>> <Se...@microsoft.com.invalid> wrote:
> > > >>>>
> > > >>>> I think one thing that is needed for HBCK2 for AMv2 is to be able
> to
> > > >>> delete single procedures from store.
> > > >>>> We are evaluating master (whose assignment is very similar to
> > > branch-2)
> > > >>> right now and I have to delete proc WAL pretty much every day
> because
> > > some
> > > >>> procedure(s) are in bad state, but deleting the entire WAL also
> > causes
> > > >>> other issues.
> > > >>>> It should be possible to remove some offending procedure while
> > master
> > > >>> is offline and/or online.
> > > >>>>
> > > >>>> -----Original Message-----
> > > >>>> From: 张铎(Duo Zhang) <pa...@gmail.com>
> > > >>>> Sent: Friday, January 18, 2019 5:52 PM
> > > >>>> To: HBase Dev List <de...@hbase.apache.org>
> > > >>>> Subject: Re: [DISCUSS] Moving towards a branch-2 line that can get
> > the
> > > >>> 'stable' pointer.
> > > >>>>
> > > >>>> OK, the original issue is HBCK2 for AMv2, but here we need to do
> > more,
> > > >>> not only for AMv2.
> > > >>>>
> > > >>>> Let me open a new issue and post what Andrew said above there.
> > > >>>>
> > > >>>
> > > >>
> > >
> >
>

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by OpenInx <op...@gmail.com>.
> Let's gather a few stories on it working for folks in production and move
the pointer then.
Agree,  XiaoMi is making parts of the online clusters upgrade to
HBase2.2.x, I think
Guanghao will share the practices some time later.
Thanks.

On Sun, Sep 15, 2019 at 12:09 AM Stack <st...@duboce.net> wrote:

> On Fri, Sep 13, 2019 at 4:58 PM Andrew Purtell <an...@gmail.com>
> wrote:
>
> > For what it’s worth I had previously been concerned about the disparity
> > between hbck capability in 1.x and 2.x but after review of the recent
> work
> > I believe that is no longer true. Put another way, it is reasonable to
> > claim it on par.
> >
> >
> Thanks Andrew for chiming in.
>
>
>
> > As for moving the stable pointer I don’t personally have enough
> experience
> > with HBase 2 to weigh in but will trust the opinions of those that do.
> >
> >
> >
> Let's gather a few stories on it working for folks in production and move
> the pointer then.
>
> Thanks,
> S
>
>
>
> > > On Sep 14, 2019, at 8:44 AM, Stack <st...@duboce.net> wrote:
> > >
> > > HBASE-21745 <https://issues.apache.org/jira/browse/HBASE-21745>, the
> > issue
> > > addressing gaps between hbck1 and hbck2 was closed a few days back
> after
> > a
> > > bunch of work by a kaleidoscope of folks. The release notes section
> tries
> > > to describe what was added by HBASE-21745. Shout if you think the claim
> > at
> > > the end of the release notes section that hbck2 now is on par or beyond
> > > what hbck1 offered is problematic. Otherwise, will proceed as though it
> > is
> > > the case.
> > >
> > > Suggestion: Given that hbase 2.2.1 will ship soon and
> > hbase-operator-tools
> > > 1.0.0 with latest hbase-hbck2 should get an RC inside the next week or
> > so,
> > > if feedback that 2.2.1 looks good, give 2.2.2 (with bug fixes only) the
> > > stable pointer?
> > >
> > > Thanks,
> > > S
> > >
> > >> On Sat, Jan 26, 2019 at 11:31 AM Stack <st...@duboce.net> wrote:
> > >>
> > >> As per Sean, bypass with optional 'force' (override) and recurse for
> > case
> > >> where a procedure had spawned children was the mechanism Allan
> > implemented
> > >> after a chat about merits of procedure delete. I found it of use doing
> > >> fixup to clusters I'd intentionally damaged testing candidates.
> > Procedures
> > >> are usually part of a fabric with relations that an operator might
> have
> > >> trouble unraveling. It was thought that the bypass would be safer
> than a
> > >> delete, likely to cause more damage than solution.
> > >>
> > >> Interested in the issues you are seeing on Master branch Sergey.
> > >>
> > >> Thanks,
> > >> S
> > >>
> > >>> On Fri, Jan 25, 2019 at 1:54 PM Sean Busbey <bu...@apache.org>
> wrote:
> > >>>
> > >>> that's already present, see the README for the "bypass" command:
> > >>>
> > >>>
> https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
> > >>>
> > >>> On Fri, Jan 25, 2019 at 3:40 PM Sergey Shelukhin
> > >>> <Se...@microsoft.com.invalid> wrote:
> > >>>>
> > >>>> I think one thing that is needed for HBCK2 for AMv2 is to be able to
> > >>> delete single procedures from store.
> > >>>> We are evaluating master (whose assignment is very similar to
> > branch-2)
> > >>> right now and I have to delete proc WAL pretty much every day because
> > some
> > >>> procedure(s) are in bad state, but deleting the entire WAL also
> causes
> > >>> other issues.
> > >>>> It should be possible to remove some offending procedure while
> master
> > >>> is offline and/or online.
> > >>>>
> > >>>> -----Original Message-----
> > >>>> From: 张铎(Duo Zhang) <pa...@gmail.com>
> > >>>> Sent: Friday, January 18, 2019 5:52 PM
> > >>>> To: HBase Dev List <de...@hbase.apache.org>
> > >>>> Subject: Re: [DISCUSS] Moving towards a branch-2 line that can get
> the
> > >>> 'stable' pointer.
> > >>>>
> > >>>> OK, the original issue is HBCK2 for AMv2, but here we need to do
> more,
> > >>> not only for AMv2.
> > >>>>
> > >>>> Let me open a new issue and post what Andrew said above there.
> > >>>>
> > >>>
> > >>
> >
>

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by Stack <st...@duboce.net>.
On Fri, Sep 13, 2019 at 4:58 PM Andrew Purtell <an...@gmail.com>
wrote:

> For what it’s worth I had previously been concerned about the disparity
> between hbck capability in 1.x and 2.x but after review of the recent work
> I believe that is no longer true. Put another way, it is reasonable to
> claim it on par.
>
>
Thanks Andrew for chiming in.



> As for moving the stable pointer I don’t personally have enough experience
> with HBase 2 to weigh in but will trust the opinions of those that do.
>
>
>
Let's gather a few stories on it working for folks in production and move
the pointer then.

Thanks,
S



> > On Sep 14, 2019, at 8:44 AM, Stack <st...@duboce.net> wrote:
> >
> > HBASE-21745 <https://issues.apache.org/jira/browse/HBASE-21745>, the
> issue
> > addressing gaps between hbck1 and hbck2 was closed a few days back after
> a
> > bunch of work by a kaleidoscope of folks. The release notes section tries
> > to describe what was added by HBASE-21745. Shout if you think the claim
> at
> > the end of the release notes section that hbck2 now is on par or beyond
> > what hbck1 offered is problematic. Otherwise, will proceed as though it
> is
> > the case.
> >
> > Suggestion: Given that hbase 2.2.1 will ship soon and
> hbase-operator-tools
> > 1.0.0 with latest hbase-hbck2 should get an RC inside the next week or
> so,
> > if feedback that 2.2.1 looks good, give 2.2.2 (with bug fixes only) the
> > stable pointer?
> >
> > Thanks,
> > S
> >
> >> On Sat, Jan 26, 2019 at 11:31 AM Stack <st...@duboce.net> wrote:
> >>
> >> As per Sean, bypass with optional 'force' (override) and recurse for
> case
> >> where a procedure had spawned children was the mechanism Allan
> implemented
> >> after a chat about merits of procedure delete. I found it of use doing
> >> fixup to clusters I'd intentionally damaged testing candidates.
> Procedures
> >> are usually part of a fabric with relations that an operator might have
> >> trouble unraveling. It was thought that the bypass would be safer than a
> >> delete, likely to cause more damage than solution.
> >>
> >> Interested in the issues you are seeing on Master branch Sergey.
> >>
> >> Thanks,
> >> S
> >>
> >>> On Fri, Jan 25, 2019 at 1:54 PM Sean Busbey <bu...@apache.org> wrote:
> >>>
> >>> that's already present, see the README for the "bypass" command:
> >>>
> >>> https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
> >>>
> >>> On Fri, Jan 25, 2019 at 3:40 PM Sergey Shelukhin
> >>> <Se...@microsoft.com.invalid> wrote:
> >>>>
> >>>> I think one thing that is needed for HBCK2 for AMv2 is to be able to
> >>> delete single procedures from store.
> >>>> We are evaluating master (whose assignment is very similar to
> branch-2)
> >>> right now and I have to delete proc WAL pretty much every day because
> some
> >>> procedure(s) are in bad state, but deleting the entire WAL also causes
> >>> other issues.
> >>>> It should be possible to remove some offending procedure while master
> >>> is offline and/or online.
> >>>>
> >>>> -----Original Message-----
> >>>> From: 张铎(Duo Zhang) <pa...@gmail.com>
> >>>> Sent: Friday, January 18, 2019 5:52 PM
> >>>> To: HBase Dev List <de...@hbase.apache.org>
> >>>> Subject: Re: [DISCUSS] Moving towards a branch-2 line that can get the
> >>> 'stable' pointer.
> >>>>
> >>>> OK, the original issue is HBCK2 for AMv2, but here we need to do more,
> >>> not only for AMv2.
> >>>>
> >>>> Let me open a new issue and post what Andrew said above there.
> >>>>
> >>>
> >>
>

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by Andrew Purtell <an...@gmail.com>.
For what it’s worth I had previously been concerned about the disparity between hbck capability in 1.x and 2.x but after review of the recent work I believe that is no longer true. Put another way, it is reasonable to claim it on par.

As for moving the stable pointer I don’t personally have enough experience with HBase 2 to weigh in but will trust the opinions of those that do. 


> On Sep 14, 2019, at 8:44 AM, Stack <st...@duboce.net> wrote:
> 
> HBASE-21745 <https://issues.apache.org/jira/browse/HBASE-21745>, the issue
> addressing gaps between hbck1 and hbck2 was closed a few days back after a
> bunch of work by a kaleidoscope of folks. The release notes section tries
> to describe what was added by HBASE-21745. Shout if you think the claim at
> the end of the release notes section that hbck2 now is on par or beyond
> what hbck1 offered is problematic. Otherwise, will proceed as though it is
> the case.
> 
> Suggestion: Given that hbase 2.2.1 will ship soon and hbase-operator-tools
> 1.0.0 with latest hbase-hbck2 should get an RC inside the next week or so,
> if feedback that 2.2.1 looks good, give 2.2.2 (with bug fixes only) the
> stable pointer?
> 
> Thanks,
> S
> 
>> On Sat, Jan 26, 2019 at 11:31 AM Stack <st...@duboce.net> wrote:
>> 
>> As per Sean, bypass with optional 'force' (override) and recurse for case
>> where a procedure had spawned children was the mechanism Allan implemented
>> after a chat about merits of procedure delete. I found it of use doing
>> fixup to clusters I'd intentionally damaged testing candidates. Procedures
>> are usually part of a fabric with relations that an operator might have
>> trouble unraveling. It was thought that the bypass would be safer than a
>> delete, likely to cause more damage than solution.
>> 
>> Interested in the issues you are seeing on Master branch Sergey.
>> 
>> Thanks,
>> S
>> 
>>> On Fri, Jan 25, 2019 at 1:54 PM Sean Busbey <bu...@apache.org> wrote:
>>> 
>>> that's already present, see the README for the "bypass" command:
>>> 
>>> https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
>>> 
>>> On Fri, Jan 25, 2019 at 3:40 PM Sergey Shelukhin
>>> <Se...@microsoft.com.invalid> wrote:
>>>> 
>>>> I think one thing that is needed for HBCK2 for AMv2 is to be able to
>>> delete single procedures from store.
>>>> We are evaluating master (whose assignment is very similar to branch-2)
>>> right now and I have to delete proc WAL pretty much every day because some
>>> procedure(s) are in bad state, but deleting the entire WAL also causes
>>> other issues.
>>>> It should be possible to remove some offending procedure while master
>>> is offline and/or online.
>>>> 
>>>> -----Original Message-----
>>>> From: 张铎(Duo Zhang) <pa...@gmail.com>
>>>> Sent: Friday, January 18, 2019 5:52 PM
>>>> To: HBase Dev List <de...@hbase.apache.org>
>>>> Subject: Re: [DISCUSS] Moving towards a branch-2 line that can get the
>>> 'stable' pointer.
>>>> 
>>>> OK, the original issue is HBCK2 for AMv2, but here we need to do more,
>>> not only for AMv2.
>>>> 
>>>> Let me open a new issue and post what Andrew said above there.
>>>> 
>>> 
>> 

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by Stack <st...@duboce.net>.
HBASE-21745 <https://issues.apache.org/jira/browse/HBASE-21745>, the issue
addressing gaps between hbck1 and hbck2 was closed a few days back after a
bunch of work by a kaleidoscope of folks. The release notes section tries
to describe what was added by HBASE-21745. Shout if you think the claim at
the end of the release notes section that hbck2 now is on par or beyond
what hbck1 offered is problematic. Otherwise, will proceed as though it is
the case.

Suggestion: Given that hbase 2.2.1 will ship soon and hbase-operator-tools
1.0.0 with latest hbase-hbck2 should get an RC inside the next week or so,
if feedback that 2.2.1 looks good, give 2.2.2 (with bug fixes only) the
stable pointer?

Thanks,
S

On Sat, Jan 26, 2019 at 11:31 AM Stack <st...@duboce.net> wrote:

> As per Sean, bypass with optional 'force' (override) and recurse for case
> where a procedure had spawned children was the mechanism Allan implemented
> after a chat about merits of procedure delete. I found it of use doing
> fixup to clusters I'd intentionally damaged testing candidates. Procedures
> are usually part of a fabric with relations that an operator might have
> trouble unraveling. It was thought that the bypass would be safer than a
> delete, likely to cause more damage than solution.
>
> Interested in the issues you are seeing on Master branch Sergey.
>
> Thanks,
> S
>
> On Fri, Jan 25, 2019 at 1:54 PM Sean Busbey <bu...@apache.org> wrote:
>
>> that's already present, see the README for the "bypass" command:
>>
>> https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
>>
>> On Fri, Jan 25, 2019 at 3:40 PM Sergey Shelukhin
>> <Se...@microsoft.com.invalid> wrote:
>> >
>> > I think one thing that is needed for HBCK2 for AMv2 is to be able to
>> delete single procedures from store.
>> > We are evaluating master (whose assignment is very similar to branch-2)
>> right now and I have to delete proc WAL pretty much every day because some
>> procedure(s) are in bad state, but deleting the entire WAL also causes
>> other issues.
>> > It should be possible to remove some offending procedure while master
>> is offline and/or online.
>> >
>> > -----Original Message-----
>> > From: 张铎(Duo Zhang) <pa...@gmail.com>
>> > Sent: Friday, January 18, 2019 5:52 PM
>> > To: HBase Dev List <de...@hbase.apache.org>
>> > Subject: Re: [DISCUSS] Moving towards a branch-2 line that can get the
>> 'stable' pointer.
>> >
>> > OK, the original issue is HBCK2 for AMv2, but here we need to do more,
>> not only for AMv2.
>> >
>> > Let me open a new issue and post what Andrew said above there.
>> >
>>
>

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by Stack <st...@duboce.net>.
As per Sean, bypass with optional 'force' (override) and recurse for case
where a procedure had spawned children was the mechanism Allan implemented
after a chat about merits of procedure delete. I found it of use doing
fixup to clusters I'd intentionally damaged testing candidates. Procedures
are usually part of a fabric with relations that an operator might have
trouble unraveling. It was thought that the bypass would be safer than a
delete, likely to cause more damage than solution.

Interested in the issues you are seeing on Master branch Sergey.

Thanks,
S

On Fri, Jan 25, 2019 at 1:54 PM Sean Busbey <bu...@apache.org> wrote:

> that's already present, see the README for the "bypass" command:
>
> https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2
>
> On Fri, Jan 25, 2019 at 3:40 PM Sergey Shelukhin
> <Se...@microsoft.com.invalid> wrote:
> >
> > I think one thing that is needed for HBCK2 for AMv2 is to be able to
> delete single procedures from store.
> > We are evaluating master (whose assignment is very similar to branch-2)
> right now and I have to delete proc WAL pretty much every day because some
> procedure(s) are in bad state, but deleting the entire WAL also causes
> other issues.
> > It should be possible to remove some offending procedure while master is
> offline and/or online.
> >
> > -----Original Message-----
> > From: 张铎(Duo Zhang) <pa...@gmail.com>
> > Sent: Friday, January 18, 2019 5:52 PM
> > To: HBase Dev List <de...@hbase.apache.org>
> > Subject: Re: [DISCUSS] Moving towards a branch-2 line that can get the
> 'stable' pointer.
> >
> > OK, the original issue is HBCK2 for AMv2, but here we need to do more,
> not only for AMv2.
> >
> > Let me open a new issue and post what Andrew said above there.
> >
>

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by Sean Busbey <bu...@apache.org>.
that's already present, see the README for the "bypass" command:

https://github.com/apache/hbase-operator-tools/tree/master/hbase-hbck2

On Fri, Jan 25, 2019 at 3:40 PM Sergey Shelukhin
<Se...@microsoft.com.invalid> wrote:
>
> I think one thing that is needed for HBCK2 for AMv2 is to be able to delete single procedures from store.
> We are evaluating master (whose assignment is very similar to branch-2) right now and I have to delete proc WAL pretty much every day because some procedure(s) are in bad state, but deleting the entire WAL also causes other issues.
> It should be possible to remove some offending procedure while master is offline and/or online.
>
> -----Original Message-----
> From: 张铎(Duo Zhang) <pa...@gmail.com>
> Sent: Friday, January 18, 2019 5:52 PM
> To: HBase Dev List <de...@hbase.apache.org>
> Subject: Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.
>
> OK, the original issue is HBCK2 for AMv2, but here we need to do more, not only for AMv2.
>
> Let me open a new issue and post what Andrew said above there.
>

RE: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by Sergey Shelukhin <Se...@microsoft.com.INVALID>.
I think one thing that is needed for HBCK2 for AMv2 is to be able to delete single procedures from store.
We are evaluating master (whose assignment is very similar to branch-2) right now and I have to delete proc WAL pretty much every day because some procedure(s) are in bad state, but deleting the entire WAL also causes other issues.
It should be possible to remove some offending procedure while master is offline and/or online.

-----Original Message-----
From: 张铎(Duo Zhang) <pa...@gmail.com> 
Sent: Friday, January 18, 2019 5:52 PM
To: HBase Dev List <de...@hbase.apache.org>
Subject: Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

OK, the original issue is HBCK2 for AMv2, but here we need to do more, not only for AMv2.

Let me open a new issue and post what Andrew said above there.


Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
OK, the original issue is HBCK2 for AMv2, but here we need to do more, not
only for AMv2.

Let me open a new issue and post what Andrew said above there.

张铎(Duo Zhang) <pa...@gmail.com> 于2019年1月19日周六 上午9:26写道:

> OK, let me find the original HBCK2 issue and see how can we make progress
> on it.
>
> BTW, on scan performance, Zheng Hu has done a work to get about 40%
> performance back in this issue for 100% scan case on ycsb
>
> https://issues.apache.org/jira/browse/HBASE-21657
>
> Andrew Purtell <ap...@apache.org> 于2019年1月19日周六 上午8:14写道:
>
>> Lars was testing tip of branch-2 with Phoenix and said scans were 50%
>> slower than branch-1. I’ll try and get him to provide more details. Anyway
>> after hbck2 is complete issues like that will come out in the testing we’d
>> do as part of sanity checking a move of the pointer.
>>
>> On Fri, Jan 18, 2019 at 4:02 PM Zach York <zy...@gmail.com>
>> wrote:
>>
>> > I agree with the sentiment around HBCK2. I think these kind of recovery
>> > tools are essential before marking something stable.
>> >
>> > I also remember when we did testing around HBase 2.x/2.1 that we were
>> > getting perf degradations and couldn't seem to get performance to be as
>> > good as we were getting in the 1.x line.
>> >
>> > - Zach
>> >
>> > On Thu, Jan 17, 2019 at 11:06 PM Pankaj kr <pa...@huawei.com>
>> wrote:
>> >
>> > > Yeah, HBCK2/ OfflineMetaRepair tools are really required to migrate
>> old
>> > > version data to HBase-2. We have use cases where we are using these
>> tools
>> > > to rebuild the meta for further region assignment.
>> > > Similar discussion is going on HBASE-21665, after fixing the NPE and
>> > > rebuilding the meta, master don't assign the regions as we skip the
>> empty
>> > > regions while loading meta during master startup.
>> > >
>> > > A big +1 from my side on this...
>> > >
>> > > Regards,
>> > > Pankaj
>> > >
>> > > -----Original Message-----
>> > > From: 张铎(Duo Zhang) [mailto:palomino219@gmail.com]
>> > > Sent: 18 January 2019 11:55
>> > > To: HBase Dev List <de...@hbase.apache.org>
>> > > Subject: Re: [DISCUSS] Moving towards a branch-2 line that can get the
>> > > 'stable' pointer.
>> > >
>> > > So the first priority is to make progress on HBCK2? If we all agree,
>> > let's
>> > > start to work.
>> > >
>> > > Andrew Purtell <ap...@apache.org> 于2019年1月18日周五 下午12:31写道:
>> > >
>> > > > Sorry, let me add... Check all the boxes on that list and I'm +1 for
>> > > > moving the stable pointer (modulo some time to pound on the
>> candidate
>> > > > to really put it through its paces, like two weeks of chaos...)
>> > > >
>> > > > On Thu, Jan 17, 2019 at 8:28 PM Andrew Purtell <apurtell@apache.org
>> >
>> > > > wrote:
>> > > >
>> > > > > I do not believe we should move the stable pointer to any 2.x
>> until
>> > > > > HBCK2 is feature complete. We can discuss what that milestone
>> should
>> > > look like.
>> > > > > At a minimum, I think we need:
>> > > > >
>> > > > >    - Rebuild meta from region metadata in the filesystem, aka
>> offline
>> > > > >    meta rebuild.
>> > > > >    - Fix assignment errors (undeployed regions, double assignments
>> > > (yes,
>> > > > >    should not be possible), etc)
>> > > > >    - Fix region holes, overlaps, and other errors in the region
>> chain
>> > > > >    - Fix failed split and merge transactions that have failed to
>> roll
>> > > > >    back due to some bug (related to previous)
>> > > > >    - Enumerate store files to determine file level corruption and
>> > > > >    sideline corrupt files
>> > > > >    - Fix hfile link problems (dangling / broken)
>> > > > >
>> > > > > This is a list of the real problems I have had to fix in
>> production
>> > > > > at least once (in the past 10 years...).
>> > > > >
>> > > > > On Thu, Jan 17, 2019 at 8:19 PM 张铎(Duo Zhang)
>> > > > > <pa...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > >> There are still lots of small new features which we want to
>> > > > >> integrate
>> > > > into
>> > > > >> branch-2 so I'm -1 on making release directly from branch-2.
>> > > > >> Backporting at once before release is a pain I'd say, I've tried
>> > > > >> this many times recently, as we have to follow up the community
>> > > > >> version...Let's make a branch-2.2 when we want to release 2.2.0,
>> > > > >> and maybe also retire the branch-2.0?
>> > > > >>
>> > > > >> For the stable pointer, I think 2.1.x maybe a good candidate?
>> > > > >> Though we know that we may still have some bugs for the AMv2, but
>> > > > >> actually we all know that the AMv1 for all the branch-1.x also
>> has
>> > > > >> lots of bugs, that's why hbck is very important.
>> > > > >>
>> > > > >> And also +! on making progress on HBCK2, we need to port he
>> useful
>> > > > >> features of HBCK1 to HBCK2. There is no software can guarantee
>> that
>> > > > >> there is no bug, so FWIW we should have a way to fix broken
>> > > > >> clusters.
>> > > > >>
>> > > > >> Sean Busbey <bu...@apache.org> 于2019年1月18日周五 上午11:47写道:
>> > > > >>
>> > > > >> > There are a few related topics I'd like to discuss and I
>> figured
>> > > > >> > this subject line is the most likely to get a bit of attention.
>> > > > >> > :)
>> > > > >> >
>> > > > >> > First, I'd like us all to get on the same page wrt the current
>> > > > >> > state of branch-2. Personally, I don't think it can be released
>> > > > >> > as-is with a 2.y version because folks can't rolling upgrade
>> from
>> > > > >> > 2.0 or 2.1 to it due to the current implementation of
>> > > > >> > HBASE-20881. As Duo has mentioned a couple of times, folks have
>> > > > >> > to ensure there are no region transitions around during the
>> > > > >> > upgrade. I think that will be prohibitive for folks looking to
>> > > upgrade. What do other folks think?
>> > > > >> >
>> > > > >> > Second, I think our recent discussions around the need for
>> > > > >> > shifting to more minor releases for HBase 1.y also applies to
>> the
>> > > 2.y branches.
>> > > > >> > branch-2 hasn't had a release since 2.1.0 came out in July
>> 2018.
>> > > > >> > That's a scary long amount of time. I think it contributes to
>> us
>> > > > >> > ending up with changes like the above since it's easy to think
>> > > > >> > about the branch as something that has a lot of time before the
>> > > > >> > next release.
>> > > > >> >
>> > > > >> > Personally, I'd like to see us skip making minor-release
>> specific
>> > > > >> > branches for a bit unless a CVE fix or something comes up.
>> > > > >> > Ideally, that would mean we work towards a 2.2.0 release
>> directly
>> > > > >> > from branch-2 and then 2.2.1, etc. When we have a feature
>> that's
>> > > > >> > ready to backport from the master branch for a release we then
>> > > > >> > update branch-2's version to be 2.3.0.
>> > > > >> >
>> > > > >> > Or maybe we try set a regular cadence to feature releases by
>> > > > >> > having
>> > > > >> > branch-2 release a new minor, two months of new maintenance
>> > > > >> > releases, followed by a new minor. That would mean after the
>> last
>> > > > >> > of the maintenance releases we'd have a window of a few weeks
>> > > > >> > where we can all decide which features in master are mature
>> > > > >> > enough to backport for the new minor release.
>> > > > >> >
>> > > > >> > Lastly, what would it take for folks to feel confident moving
>> the
>> > > > >> > 'stable' pointer to a HBase 2.y? Is there a major gap still on
>> > > > >> > assignment stability? Is it a more thorough look at
>> performance?
>> > > > >> > More time to ensure HBCK2 has good coverage of failure modes
>> that
>> > > need it?
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Best regards,
>> > > > > Andrew
>> > > > >
>> > > > > Words like orphans lost among the crosstalk, meaning torn from
>> > > > > truth's decrepit hands
>> > > > >    - A23, Crosstalk
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Best regards,
>> > > > Andrew
>> > > >
>> > > > Words like orphans lost among the crosstalk, meaning torn from
>> truth's
>> > > > decrepit hands
>> > > >    - A23, Crosstalk
>> > > >
>> > >
>> >
>> --
>> Best regards,
>> Andrew
>>
>> Words like orphans lost among the crosstalk, meaning torn from truth's
>> decrepit hands
>>    - A23, Crosstalk
>>
>

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
OK, let me find the original HBCK2 issue and see how can we make progress
on it.

BTW, on scan performance, Zheng Hu has done a work to get about 40%
performance back in this issue for 100% scan case on ycsb

https://issues.apache.org/jira/browse/HBASE-21657

Andrew Purtell <ap...@apache.org> 于2019年1月19日周六 上午8:14写道:

> Lars was testing tip of branch-2 with Phoenix and said scans were 50%
> slower than branch-1. I’ll try and get him to provide more details. Anyway
> after hbck2 is complete issues like that will come out in the testing we’d
> do as part of sanity checking a move of the pointer.
>
> On Fri, Jan 18, 2019 at 4:02 PM Zach York <zy...@gmail.com>
> wrote:
>
> > I agree with the sentiment around HBCK2. I think these kind of recovery
> > tools are essential before marking something stable.
> >
> > I also remember when we did testing around HBase 2.x/2.1 that we were
> > getting perf degradations and couldn't seem to get performance to be as
> > good as we were getting in the 1.x line.
> >
> > - Zach
> >
> > On Thu, Jan 17, 2019 at 11:06 PM Pankaj kr <pa...@huawei.com> wrote:
> >
> > > Yeah, HBCK2/ OfflineMetaRepair tools are really required to migrate old
> > > version data to HBase-2. We have use cases where we are using these
> tools
> > > to rebuild the meta for further region assignment.
> > > Similar discussion is going on HBASE-21665, after fixing the NPE and
> > > rebuilding the meta, master don't assign the regions as we skip the
> empty
> > > regions while loading meta during master startup.
> > >
> > > A big +1 from my side on this...
> > >
> > > Regards,
> > > Pankaj
> > >
> > > -----Original Message-----
> > > From: 张铎(Duo Zhang) [mailto:palomino219@gmail.com]
> > > Sent: 18 January 2019 11:55
> > > To: HBase Dev List <de...@hbase.apache.org>
> > > Subject: Re: [DISCUSS] Moving towards a branch-2 line that can get the
> > > 'stable' pointer.
> > >
> > > So the first priority is to make progress on HBCK2? If we all agree,
> > let's
> > > start to work.
> > >
> > > Andrew Purtell <ap...@apache.org> 于2019年1月18日周五 下午12:31写道:
> > >
> > > > Sorry, let me add... Check all the boxes on that list and I'm +1 for
> > > > moving the stable pointer (modulo some time to pound on the candidate
> > > > to really put it through its paces, like two weeks of chaos...)
> > > >
> > > > On Thu, Jan 17, 2019 at 8:28 PM Andrew Purtell <ap...@apache.org>
> > > > wrote:
> > > >
> > > > > I do not believe we should move the stable pointer to any 2.x until
> > > > > HBCK2 is feature complete. We can discuss what that milestone
> should
> > > look like.
> > > > > At a minimum, I think we need:
> > > > >
> > > > >    - Rebuild meta from region metadata in the filesystem, aka
> offline
> > > > >    meta rebuild.
> > > > >    - Fix assignment errors (undeployed regions, double assignments
> > > (yes,
> > > > >    should not be possible), etc)
> > > > >    - Fix region holes, overlaps, and other errors in the region
> chain
> > > > >    - Fix failed split and merge transactions that have failed to
> roll
> > > > >    back due to some bug (related to previous)
> > > > >    - Enumerate store files to determine file level corruption and
> > > > >    sideline corrupt files
> > > > >    - Fix hfile link problems (dangling / broken)
> > > > >
> > > > > This is a list of the real problems I have had to fix in production
> > > > > at least once (in the past 10 years...).
> > > > >
> > > > > On Thu, Jan 17, 2019 at 8:19 PM 张铎(Duo Zhang)
> > > > > <pa...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> There are still lots of small new features which we want to
> > > > >> integrate
> > > > into
> > > > >> branch-2 so I'm -1 on making release directly from branch-2.
> > > > >> Backporting at once before release is a pain I'd say, I've tried
> > > > >> this many times recently, as we have to follow up the community
> > > > >> version...Let's make a branch-2.2 when we want to release 2.2.0,
> > > > >> and maybe also retire the branch-2.0?
> > > > >>
> > > > >> For the stable pointer, I think 2.1.x maybe a good candidate?
> > > > >> Though we know that we may still have some bugs for the AMv2, but
> > > > >> actually we all know that the AMv1 for all the branch-1.x also has
> > > > >> lots of bugs, that's why hbck is very important.
> > > > >>
> > > > >> And also +! on making progress on HBCK2, we need to port he useful
> > > > >> features of HBCK1 to HBCK2. There is no software can guarantee
> that
> > > > >> there is no bug, so FWIW we should have a way to fix broken
> > > > >> clusters.
> > > > >>
> > > > >> Sean Busbey <bu...@apache.org> 于2019年1月18日周五 上午11:47写道:
> > > > >>
> > > > >> > There are a few related topics I'd like to discuss and I figured
> > > > >> > this subject line is the most likely to get a bit of attention.
> > > > >> > :)
> > > > >> >
> > > > >> > First, I'd like us all to get on the same page wrt the current
> > > > >> > state of branch-2. Personally, I don't think it can be released
> > > > >> > as-is with a 2.y version because folks can't rolling upgrade
> from
> > > > >> > 2.0 or 2.1 to it due to the current implementation of
> > > > >> > HBASE-20881. As Duo has mentioned a couple of times, folks have
> > > > >> > to ensure there are no region transitions around during the
> > > > >> > upgrade. I think that will be prohibitive for folks looking to
> > > upgrade. What do other folks think?
> > > > >> >
> > > > >> > Second, I think our recent discussions around the need for
> > > > >> > shifting to more minor releases for HBase 1.y also applies to
> the
> > > 2.y branches.
> > > > >> > branch-2 hasn't had a release since 2.1.0 came out in July 2018.
> > > > >> > That's a scary long amount of time. I think it contributes to us
> > > > >> > ending up with changes like the above since it's easy to think
> > > > >> > about the branch as something that has a lot of time before the
> > > > >> > next release.
> > > > >> >
> > > > >> > Personally, I'd like to see us skip making minor-release
> specific
> > > > >> > branches for a bit unless a CVE fix or something comes up.
> > > > >> > Ideally, that would mean we work towards a 2.2.0 release
> directly
> > > > >> > from branch-2 and then 2.2.1, etc. When we have a feature that's
> > > > >> > ready to backport from the master branch for a release we then
> > > > >> > update branch-2's version to be 2.3.0.
> > > > >> >
> > > > >> > Or maybe we try set a regular cadence to feature releases by
> > > > >> > having
> > > > >> > branch-2 release a new minor, two months of new maintenance
> > > > >> > releases, followed by a new minor. That would mean after the
> last
> > > > >> > of the maintenance releases we'd have a window of a few weeks
> > > > >> > where we can all decide which features in master are mature
> > > > >> > enough to backport for the new minor release.
> > > > >> >
> > > > >> > Lastly, what would it take for folks to feel confident moving
> the
> > > > >> > 'stable' pointer to a HBase 2.y? Is there a major gap still on
> > > > >> > assignment stability? Is it a more thorough look at performance?
> > > > >> > More time to ensure HBCK2 has good coverage of failure modes
> that
> > > need it?
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Andrew
> > > > >
> > > > > Words like orphans lost among the crosstalk, meaning torn from
> > > > > truth's decrepit hands
> > > > >    - A23, Crosstalk
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew
> > > >
> > > > Words like orphans lost among the crosstalk, meaning torn from
> truth's
> > > > decrepit hands
> > > >    - A23, Crosstalk
> > > >
> > >
> >
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by Andrew Purtell <ap...@apache.org>.
Lars was testing tip of branch-2 with Phoenix and said scans were 50%
slower than branch-1. I’ll try and get him to provide more details. Anyway
after hbck2 is complete issues like that will come out in the testing we’d
do as part of sanity checking a move of the pointer.

On Fri, Jan 18, 2019 at 4:02 PM Zach York <zy...@gmail.com>
wrote:

> I agree with the sentiment around HBCK2. I think these kind of recovery
> tools are essential before marking something stable.
>
> I also remember when we did testing around HBase 2.x/2.1 that we were
> getting perf degradations and couldn't seem to get performance to be as
> good as we were getting in the 1.x line.
>
> - Zach
>
> On Thu, Jan 17, 2019 at 11:06 PM Pankaj kr <pa...@huawei.com> wrote:
>
> > Yeah, HBCK2/ OfflineMetaRepair tools are really required to migrate old
> > version data to HBase-2. We have use cases where we are using these tools
> > to rebuild the meta for further region assignment.
> > Similar discussion is going on HBASE-21665, after fixing the NPE and
> > rebuilding the meta, master don't assign the regions as we skip the empty
> > regions while loading meta during master startup.
> >
> > A big +1 from my side on this...
> >
> > Regards,
> > Pankaj
> >
> > -----Original Message-----
> > From: 张铎(Duo Zhang) [mailto:palomino219@gmail.com]
> > Sent: 18 January 2019 11:55
> > To: HBase Dev List <de...@hbase.apache.org>
> > Subject: Re: [DISCUSS] Moving towards a branch-2 line that can get the
> > 'stable' pointer.
> >
> > So the first priority is to make progress on HBCK2? If we all agree,
> let's
> > start to work.
> >
> > Andrew Purtell <ap...@apache.org> 于2019年1月18日周五 下午12:31写道:
> >
> > > Sorry, let me add... Check all the boxes on that list and I'm +1 for
> > > moving the stable pointer (modulo some time to pound on the candidate
> > > to really put it through its paces, like two weeks of chaos...)
> > >
> > > On Thu, Jan 17, 2019 at 8:28 PM Andrew Purtell <ap...@apache.org>
> > > wrote:
> > >
> > > > I do not believe we should move the stable pointer to any 2.x until
> > > > HBCK2 is feature complete. We can discuss what that milestone should
> > look like.
> > > > At a minimum, I think we need:
> > > >
> > > >    - Rebuild meta from region metadata in the filesystem, aka offline
> > > >    meta rebuild.
> > > >    - Fix assignment errors (undeployed regions, double assignments
> > (yes,
> > > >    should not be possible), etc)
> > > >    - Fix region holes, overlaps, and other errors in the region chain
> > > >    - Fix failed split and merge transactions that have failed to roll
> > > >    back due to some bug (related to previous)
> > > >    - Enumerate store files to determine file level corruption and
> > > >    sideline corrupt files
> > > >    - Fix hfile link problems (dangling / broken)
> > > >
> > > > This is a list of the real problems I have had to fix in production
> > > > at least once (in the past 10 years...).
> > > >
> > > > On Thu, Jan 17, 2019 at 8:19 PM 张铎(Duo Zhang)
> > > > <pa...@gmail.com>
> > > > wrote:
> > > >
> > > >> There are still lots of small new features which we want to
> > > >> integrate
> > > into
> > > >> branch-2 so I'm -1 on making release directly from branch-2.
> > > >> Backporting at once before release is a pain I'd say, I've tried
> > > >> this many times recently, as we have to follow up the community
> > > >> version...Let's make a branch-2.2 when we want to release 2.2.0,
> > > >> and maybe also retire the branch-2.0?
> > > >>
> > > >> For the stable pointer, I think 2.1.x maybe a good candidate?
> > > >> Though we know that we may still have some bugs for the AMv2, but
> > > >> actually we all know that the AMv1 for all the branch-1.x also has
> > > >> lots of bugs, that's why hbck is very important.
> > > >>
> > > >> And also +! on making progress on HBCK2, we need to port he useful
> > > >> features of HBCK1 to HBCK2. There is no software can guarantee that
> > > >> there is no bug, so FWIW we should have a way to fix broken
> > > >> clusters.
> > > >>
> > > >> Sean Busbey <bu...@apache.org> 于2019年1月18日周五 上午11:47写道:
> > > >>
> > > >> > There are a few related topics I'd like to discuss and I figured
> > > >> > this subject line is the most likely to get a bit of attention.
> > > >> > :)
> > > >> >
> > > >> > First, I'd like us all to get on the same page wrt the current
> > > >> > state of branch-2. Personally, I don't think it can be released
> > > >> > as-is with a 2.y version because folks can't rolling upgrade from
> > > >> > 2.0 or 2.1 to it due to the current implementation of
> > > >> > HBASE-20881. As Duo has mentioned a couple of times, folks have
> > > >> > to ensure there are no region transitions around during the
> > > >> > upgrade. I think that will be prohibitive for folks looking to
> > upgrade. What do other folks think?
> > > >> >
> > > >> > Second, I think our recent discussions around the need for
> > > >> > shifting to more minor releases for HBase 1.y also applies to the
> > 2.y branches.
> > > >> > branch-2 hasn't had a release since 2.1.0 came out in July 2018.
> > > >> > That's a scary long amount of time. I think it contributes to us
> > > >> > ending up with changes like the above since it's easy to think
> > > >> > about the branch as something that has a lot of time before the
> > > >> > next release.
> > > >> >
> > > >> > Personally, I'd like to see us skip making minor-release specific
> > > >> > branches for a bit unless a CVE fix or something comes up.
> > > >> > Ideally, that would mean we work towards a 2.2.0 release directly
> > > >> > from branch-2 and then 2.2.1, etc. When we have a feature that's
> > > >> > ready to backport from the master branch for a release we then
> > > >> > update branch-2's version to be 2.3.0.
> > > >> >
> > > >> > Or maybe we try set a regular cadence to feature releases by
> > > >> > having
> > > >> > branch-2 release a new minor, two months of new maintenance
> > > >> > releases, followed by a new minor. That would mean after the last
> > > >> > of the maintenance releases we'd have a window of a few weeks
> > > >> > where we can all decide which features in master are mature
> > > >> > enough to backport for the new minor release.
> > > >> >
> > > >> > Lastly, what would it take for folks to feel confident moving the
> > > >> > 'stable' pointer to a HBase 2.y? Is there a major gap still on
> > > >> > assignment stability? Is it a more thorough look at performance?
> > > >> > More time to ensure HBCK2 has good coverage of failure modes that
> > need it?
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew
> > > >
> > > > Words like orphans lost among the crosstalk, meaning torn from
> > > > truth's decrepit hands
> > > >    - A23, Crosstalk
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Words like orphans lost among the crosstalk, meaning torn from truth's
> > > decrepit hands
> > >    - A23, Crosstalk
> > >
> >
>
-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by Zach York <zy...@gmail.com>.
I agree with the sentiment around HBCK2. I think these kind of recovery
tools are essential before marking something stable.

I also remember when we did testing around HBase 2.x/2.1 that we were
getting perf degradations and couldn't seem to get performance to be as
good as we were getting in the 1.x line.

- Zach

On Thu, Jan 17, 2019 at 11:06 PM Pankaj kr <pa...@huawei.com> wrote:

> Yeah, HBCK2/ OfflineMetaRepair tools are really required to migrate old
> version data to HBase-2. We have use cases where we are using these tools
> to rebuild the meta for further region assignment.
> Similar discussion is going on HBASE-21665, after fixing the NPE and
> rebuilding the meta, master don't assign the regions as we skip the empty
> regions while loading meta during master startup.
>
> A big +1 from my side on this...
>
> Regards,
> Pankaj
>
> -----Original Message-----
> From: 张铎(Duo Zhang) [mailto:palomino219@gmail.com]
> Sent: 18 January 2019 11:55
> To: HBase Dev List <de...@hbase.apache.org>
> Subject: Re: [DISCUSS] Moving towards a branch-2 line that can get the
> 'stable' pointer.
>
> So the first priority is to make progress on HBCK2? If we all agree, let's
> start to work.
>
> Andrew Purtell <ap...@apache.org> 于2019年1月18日周五 下午12:31写道:
>
> > Sorry, let me add... Check all the boxes on that list and I'm +1 for
> > moving the stable pointer (modulo some time to pound on the candidate
> > to really put it through its paces, like two weeks of chaos...)
> >
> > On Thu, Jan 17, 2019 at 8:28 PM Andrew Purtell <ap...@apache.org>
> > wrote:
> >
> > > I do not believe we should move the stable pointer to any 2.x until
> > > HBCK2 is feature complete. We can discuss what that milestone should
> look like.
> > > At a minimum, I think we need:
> > >
> > >    - Rebuild meta from region metadata in the filesystem, aka offline
> > >    meta rebuild.
> > >    - Fix assignment errors (undeployed regions, double assignments
> (yes,
> > >    should not be possible), etc)
> > >    - Fix region holes, overlaps, and other errors in the region chain
> > >    - Fix failed split and merge transactions that have failed to roll
> > >    back due to some bug (related to previous)
> > >    - Enumerate store files to determine file level corruption and
> > >    sideline corrupt files
> > >    - Fix hfile link problems (dangling / broken)
> > >
> > > This is a list of the real problems I have had to fix in production
> > > at least once (in the past 10 years...).
> > >
> > > On Thu, Jan 17, 2019 at 8:19 PM 张铎(Duo Zhang)
> > > <pa...@gmail.com>
> > > wrote:
> > >
> > >> There are still lots of small new features which we want to
> > >> integrate
> > into
> > >> branch-2 so I'm -1 on making release directly from branch-2.
> > >> Backporting at once before release is a pain I'd say, I've tried
> > >> this many times recently, as we have to follow up the community
> > >> version...Let's make a branch-2.2 when we want to release 2.2.0,
> > >> and maybe also retire the branch-2.0?
> > >>
> > >> For the stable pointer, I think 2.1.x maybe a good candidate?
> > >> Though we know that we may still have some bugs for the AMv2, but
> > >> actually we all know that the AMv1 for all the branch-1.x also has
> > >> lots of bugs, that's why hbck is very important.
> > >>
> > >> And also +! on making progress on HBCK2, we need to port he useful
> > >> features of HBCK1 to HBCK2. There is no software can guarantee that
> > >> there is no bug, so FWIW we should have a way to fix broken
> > >> clusters.
> > >>
> > >> Sean Busbey <bu...@apache.org> 于2019年1月18日周五 上午11:47写道:
> > >>
> > >> > There are a few related topics I'd like to discuss and I figured
> > >> > this subject line is the most likely to get a bit of attention.
> > >> > :)
> > >> >
> > >> > First, I'd like us all to get on the same page wrt the current
> > >> > state of branch-2. Personally, I don't think it can be released
> > >> > as-is with a 2.y version because folks can't rolling upgrade from
> > >> > 2.0 or 2.1 to it due to the current implementation of
> > >> > HBASE-20881. As Duo has mentioned a couple of times, folks have
> > >> > to ensure there are no region transitions around during the
> > >> > upgrade. I think that will be prohibitive for folks looking to
> upgrade. What do other folks think?
> > >> >
> > >> > Second, I think our recent discussions around the need for
> > >> > shifting to more minor releases for HBase 1.y also applies to the
> 2.y branches.
> > >> > branch-2 hasn't had a release since 2.1.0 came out in July 2018.
> > >> > That's a scary long amount of time. I think it contributes to us
> > >> > ending up with changes like the above since it's easy to think
> > >> > about the branch as something that has a lot of time before the
> > >> > next release.
> > >> >
> > >> > Personally, I'd like to see us skip making minor-release specific
> > >> > branches for a bit unless a CVE fix or something comes up.
> > >> > Ideally, that would mean we work towards a 2.2.0 release directly
> > >> > from branch-2 and then 2.2.1, etc. When we have a feature that's
> > >> > ready to backport from the master branch for a release we then
> > >> > update branch-2's version to be 2.3.0.
> > >> >
> > >> > Or maybe we try set a regular cadence to feature releases by
> > >> > having
> > >> > branch-2 release a new minor, two months of new maintenance
> > >> > releases, followed by a new minor. That would mean after the last
> > >> > of the maintenance releases we'd have a window of a few weeks
> > >> > where we can all decide which features in master are mature
> > >> > enough to backport for the new minor release.
> > >> >
> > >> > Lastly, what would it take for folks to feel confident moving the
> > >> > 'stable' pointer to a HBase 2.y? Is there a major gap still on
> > >> > assignment stability? Is it a more thorough look at performance?
> > >> > More time to ensure HBCK2 has good coverage of failure modes that
> need it?
> > >> >
> > >>
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Words like orphans lost among the crosstalk, meaning torn from
> > > truth's decrepit hands
> > >    - A23, Crosstalk
> > >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >    - A23, Crosstalk
> >
>

RE: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by Pankaj kr <pa...@huawei.com>.
Yeah, HBCK2/ OfflineMetaRepair tools are really required to migrate old version data to HBase-2. We have use cases where we are using these tools to rebuild the meta for further region assignment.
Similar discussion is going on HBASE-21665, after fixing the NPE and rebuilding the meta, master don't assign the regions as we skip the empty regions while loading meta during master startup.

A big +1 from my side on this... 

Regards,
Pankaj

-----Original Message-----
From: 张铎(Duo Zhang) [mailto:palomino219@gmail.com] 
Sent: 18 January 2019 11:55
To: HBase Dev List <de...@hbase.apache.org>
Subject: Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

So the first priority is to make progress on HBCK2? If we all agree, let's start to work.

Andrew Purtell <ap...@apache.org> 于2019年1月18日周五 下午12:31写道:

> Sorry, let me add... Check all the boxes on that list and I'm +1 for 
> moving the stable pointer (modulo some time to pound on the candidate 
> to really put it through its paces, like two weeks of chaos...)
>
> On Thu, Jan 17, 2019 at 8:28 PM Andrew Purtell <ap...@apache.org>
> wrote:
>
> > I do not believe we should move the stable pointer to any 2.x until 
> > HBCK2 is feature complete. We can discuss what that milestone should look like.
> > At a minimum, I think we need:
> >
> >    - Rebuild meta from region metadata in the filesystem, aka offline
> >    meta rebuild.
> >    - Fix assignment errors (undeployed regions, double assignments (yes,
> >    should not be possible), etc)
> >    - Fix region holes, overlaps, and other errors in the region chain
> >    - Fix failed split and merge transactions that have failed to roll
> >    back due to some bug (related to previous)
> >    - Enumerate store files to determine file level corruption and
> >    sideline corrupt files
> >    - Fix hfile link problems (dangling / broken)
> >
> > This is a list of the real problems I have had to fix in production 
> > at least once (in the past 10 years...).
> >
> > On Thu, Jan 17, 2019 at 8:19 PM 张铎(Duo Zhang) 
> > <pa...@gmail.com>
> > wrote:
> >
> >> There are still lots of small new features which we want to 
> >> integrate
> into
> >> branch-2 so I'm -1 on making release directly from branch-2. 
> >> Backporting at once before release is a pain I'd say, I've tried 
> >> this many times recently, as we have to follow up the community 
> >> version...Let's make a branch-2.2 when we want to release 2.2.0, 
> >> and maybe also retire the branch-2.0?
> >>
> >> For the stable pointer, I think 2.1.x maybe a good candidate? 
> >> Though we know that we may still have some bugs for the AMv2, but 
> >> actually we all know that the AMv1 for all the branch-1.x also has 
> >> lots of bugs, that's why hbck is very important.
> >>
> >> And also +! on making progress on HBCK2, we need to port he useful 
> >> features of HBCK1 to HBCK2. There is no software can guarantee that 
> >> there is no bug, so FWIW we should have a way to fix broken 
> >> clusters.
> >>
> >> Sean Busbey <bu...@apache.org> 于2019年1月18日周五 上午11:47写道:
> >>
> >> > There are a few related topics I'd like to discuss and I figured 
> >> > this subject line is the most likely to get a bit of attention. 
> >> > :)
> >> >
> >> > First, I'd like us all to get on the same page wrt the current 
> >> > state of branch-2. Personally, I don't think it can be released 
> >> > as-is with a 2.y version because folks can't rolling upgrade from 
> >> > 2.0 or 2.1 to it due to the current implementation of 
> >> > HBASE-20881. As Duo has mentioned a couple of times, folks have 
> >> > to ensure there are no region transitions around during the 
> >> > upgrade. I think that will be prohibitive for folks looking to upgrade. What do other folks think?
> >> >
> >> > Second, I think our recent discussions around the need for 
> >> > shifting to more minor releases for HBase 1.y also applies to the 2.y branches.
> >> > branch-2 hasn't had a release since 2.1.0 came out in July 2018.
> >> > That's a scary long amount of time. I think it contributes to us 
> >> > ending up with changes like the above since it's easy to think 
> >> > about the branch as something that has a lot of time before the 
> >> > next release.
> >> >
> >> > Personally, I'd like to see us skip making minor-release specific 
> >> > branches for a bit unless a CVE fix or something comes up. 
> >> > Ideally, that would mean we work towards a 2.2.0 release directly 
> >> > from branch-2 and then 2.2.1, etc. When we have a feature that's 
> >> > ready to backport from the master branch for a release we then 
> >> > update branch-2's version to be 2.3.0.
> >> >
> >> > Or maybe we try set a regular cadence to feature releases by 
> >> > having
> >> > branch-2 release a new minor, two months of new maintenance 
> >> > releases, followed by a new minor. That would mean after the last 
> >> > of the maintenance releases we'd have a window of a few weeks 
> >> > where we can all decide which features in master are mature 
> >> > enough to backport for the new minor release.
> >> >
> >> > Lastly, what would it take for folks to feel confident moving the 
> >> > 'stable' pointer to a HBase 2.y? Is there a major gap still on 
> >> > assignment stability? Is it a more thorough look at performance? 
> >> > More time to ensure HBCK2 has good coverage of failure modes that need it?
> >> >
> >>
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from 
> > truth's decrepit hands
> >    - A23, Crosstalk
> >
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's 
> decrepit hands
>    - A23, Crosstalk
>

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
So the first priority is to make progress on HBCK2? If we all agree, let's
start to work.

Andrew Purtell <ap...@apache.org> 于2019年1月18日周五 下午12:31写道:

> Sorry, let me add... Check all the boxes on that list and I'm +1 for moving
> the stable pointer (modulo some time to pound on the candidate to really
> put it through its paces, like two weeks of chaos...)
>
> On Thu, Jan 17, 2019 at 8:28 PM Andrew Purtell <ap...@apache.org>
> wrote:
>
> > I do not believe we should move the stable pointer to any 2.x until HBCK2
> > is feature complete. We can discuss what that milestone should look like.
> > At a minimum, I think we need:
> >
> >    - Rebuild meta from region metadata in the filesystem, aka offline
> >    meta rebuild.
> >    - Fix assignment errors (undeployed regions, double assignments (yes,
> >    should not be possible), etc)
> >    - Fix region holes, overlaps, and other errors in the region chain
> >    - Fix failed split and merge transactions that have failed to roll
> >    back due to some bug (related to previous)
> >    - Enumerate store files to determine file level corruption and
> >    sideline corrupt files
> >    - Fix hfile link problems (dangling / broken)
> >
> > This is a list of the real problems I have had to fix in production at
> > least once (in the past 10 years...).
> >
> > On Thu, Jan 17, 2019 at 8:19 PM 张铎(Duo Zhang) <pa...@gmail.com>
> > wrote:
> >
> >> There are still lots of small new features which we want to integrate
> into
> >> branch-2 so I'm -1 on making release directly from branch-2. Backporting
> >> at
> >> once before release is a pain I'd say, I've tried this many times
> >> recently,
> >> as we have to follow up the community version...Let's make a branch-2.2
> >> when we want to release 2.2.0, and maybe also retire the branch-2.0?
> >>
> >> For the stable pointer, I think 2.1.x maybe a good candidate? Though we
> >> know that we may still have some bugs for the AMv2, but actually we all
> >> know that the AMv1 for all the branch-1.x also has lots of bugs, that's
> >> why
> >> hbck is very important.
> >>
> >> And also +! on making progress on HBCK2, we need to port he useful
> >> features
> >> of HBCK1 to HBCK2. There is no software can guarantee that there is no
> >> bug,
> >> so FWIW we should have a way to fix broken clusters.
> >>
> >> Sean Busbey <bu...@apache.org> 于2019年1月18日周五 上午11:47写道:
> >>
> >> > There are a few related topics I'd like to discuss and I figured this
> >> > subject line is the most likely to get a bit of attention. :)
> >> >
> >> > First, I'd like us all to get on the same page wrt the current state
> >> > of branch-2. Personally, I don't think it can be released as-is with a
> >> > 2.y version because folks can't rolling upgrade from 2.0 or 2.1 to it
> >> > due to the current implementation of HBASE-20881. As Duo has mentioned
> >> > a couple of times, folks have to ensure there are no region
> >> > transitions around during the upgrade. I think that will be
> >> > prohibitive for folks looking to upgrade. What do other folks think?
> >> >
> >> > Second, I think our recent discussions around the need for shifting to
> >> > more minor releases for HBase 1.y also applies to the 2.y branches.
> >> > branch-2 hasn't had a release since 2.1.0 came out in July 2018.
> >> > That's a scary long amount of time. I think it contributes to us
> >> > ending up with changes like the above since it's easy to think about
> >> > the branch as something that has a lot of time before the next
> >> > release.
> >> >
> >> > Personally, I'd like to see us skip making minor-release specific
> >> > branches for a bit unless a CVE fix or something comes up. Ideally,
> >> > that would mean we work towards a 2.2.0 release directly from branch-2
> >> > and then 2.2.1, etc. When we have a feature that's ready to backport
> >> > from the master branch for a release we then update branch-2's version
> >> > to be 2.3.0.
> >> >
> >> > Or maybe we try set a regular cadence to feature releases by having
> >> > branch-2 release a new minor, two months of new maintenance releases,
> >> > followed by a new minor. That would mean after the last of the
> >> > maintenance releases we'd have a window of a few weeks where we can
> >> > all decide which features in master are mature enough to backport for
> >> > the new minor release.
> >> >
> >> > Lastly, what would it take for folks to feel confident moving the
> >> > 'stable' pointer to a HBase 2.y? Is there a major gap still on
> >> > assignment stability? Is it a more thorough look at performance? More
> >> > time to ensure HBCK2 has good coverage of failure modes that need it?
> >> >
> >>
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >    - A23, Crosstalk
> >
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by Andrew Purtell <ap...@apache.org>.
Sorry, let me add... Check all the boxes on that list and I'm +1 for moving
the stable pointer (modulo some time to pound on the candidate to really
put it through its paces, like two weeks of chaos...)

On Thu, Jan 17, 2019 at 8:28 PM Andrew Purtell <ap...@apache.org> wrote:

> I do not believe we should move the stable pointer to any 2.x until HBCK2
> is feature complete. We can discuss what that milestone should look like.
> At a minimum, I think we need:
>
>    - Rebuild meta from region metadata in the filesystem, aka offline
>    meta rebuild.
>    - Fix assignment errors (undeployed regions, double assignments (yes,
>    should not be possible), etc)
>    - Fix region holes, overlaps, and other errors in the region chain
>    - Fix failed split and merge transactions that have failed to roll
>    back due to some bug (related to previous)
>    - Enumerate store files to determine file level corruption and
>    sideline corrupt files
>    - Fix hfile link problems (dangling / broken)
>
> This is a list of the real problems I have had to fix in production at
> least once (in the past 10 years...).
>
> On Thu, Jan 17, 2019 at 8:19 PM 张铎(Duo Zhang) <pa...@gmail.com>
> wrote:
>
>> There are still lots of small new features which we want to integrate into
>> branch-2 so I'm -1 on making release directly from branch-2. Backporting
>> at
>> once before release is a pain I'd say, I've tried this many times
>> recently,
>> as we have to follow up the community version...Let's make a branch-2.2
>> when we want to release 2.2.0, and maybe also retire the branch-2.0?
>>
>> For the stable pointer, I think 2.1.x maybe a good candidate? Though we
>> know that we may still have some bugs for the AMv2, but actually we all
>> know that the AMv1 for all the branch-1.x also has lots of bugs, that's
>> why
>> hbck is very important.
>>
>> And also +! on making progress on HBCK2, we need to port he useful
>> features
>> of HBCK1 to HBCK2. There is no software can guarantee that there is no
>> bug,
>> so FWIW we should have a way to fix broken clusters.
>>
>> Sean Busbey <bu...@apache.org> 于2019年1月18日周五 上午11:47写道:
>>
>> > There are a few related topics I'd like to discuss and I figured this
>> > subject line is the most likely to get a bit of attention. :)
>> >
>> > First, I'd like us all to get on the same page wrt the current state
>> > of branch-2. Personally, I don't think it can be released as-is with a
>> > 2.y version because folks can't rolling upgrade from 2.0 or 2.1 to it
>> > due to the current implementation of HBASE-20881. As Duo has mentioned
>> > a couple of times, folks have to ensure there are no region
>> > transitions around during the upgrade. I think that will be
>> > prohibitive for folks looking to upgrade. What do other folks think?
>> >
>> > Second, I think our recent discussions around the need for shifting to
>> > more minor releases for HBase 1.y also applies to the 2.y branches.
>> > branch-2 hasn't had a release since 2.1.0 came out in July 2018.
>> > That's a scary long amount of time. I think it contributes to us
>> > ending up with changes like the above since it's easy to think about
>> > the branch as something that has a lot of time before the next
>> > release.
>> >
>> > Personally, I'd like to see us skip making minor-release specific
>> > branches for a bit unless a CVE fix or something comes up. Ideally,
>> > that would mean we work towards a 2.2.0 release directly from branch-2
>> > and then 2.2.1, etc. When we have a feature that's ready to backport
>> > from the master branch for a release we then update branch-2's version
>> > to be 2.3.0.
>> >
>> > Or maybe we try set a regular cadence to feature releases by having
>> > branch-2 release a new minor, two months of new maintenance releases,
>> > followed by a new minor. That would mean after the last of the
>> > maintenance releases we'd have a window of a few weeks where we can
>> > all decide which features in master are mature enough to backport for
>> > the new minor release.
>> >
>> > Lastly, what would it take for folks to feel confident moving the
>> > 'stable' pointer to a HBase 2.y? Is there a major gap still on
>> > assignment stability? Is it a more thorough look at performance? More
>> > time to ensure HBCK2 has good coverage of failure modes that need it?
>> >
>>
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by Andrew Purtell <ap...@apache.org>.
I do not believe we should move the stable pointer to any 2.x until HBCK2
is feature complete. We can discuss what that milestone should look like.
At a minimum, I think we need:

   - Rebuild meta from region metadata in the filesystem, aka offline meta
   rebuild.
   - Fix assignment errors (undeployed regions, double assignments (yes,
   should not be possible), etc)
   - Fix region holes, overlaps, and other errors in the region chain
   - Fix failed split and merge transactions that have failed to roll back
   due to some bug (related to previous)
   - Enumerate store files to determine file level corruption and sideline
   corrupt files
   - Fix hfile link problems (dangling / broken)

This is a list of the real problems I have had to fix in production at
least once (in the past 10 years...).

On Thu, Jan 17, 2019 at 8:19 PM 张铎(Duo Zhang) <pa...@gmail.com> wrote:

> There are still lots of small new features which we want to integrate into
> branch-2 so I'm -1 on making release directly from branch-2. Backporting at
> once before release is a pain I'd say, I've tried this many times recently,
> as we have to follow up the community version...Let's make a branch-2.2
> when we want to release 2.2.0, and maybe also retire the branch-2.0?
>
> For the stable pointer, I think 2.1.x maybe a good candidate? Though we
> know that we may still have some bugs for the AMv2, but actually we all
> know that the AMv1 for all the branch-1.x also has lots of bugs, that's why
> hbck is very important.
>
> And also +! on making progress on HBCK2, we need to port he useful features
> of HBCK1 to HBCK2. There is no software can guarantee that there is no bug,
> so FWIW we should have a way to fix broken clusters.
>
> Sean Busbey <bu...@apache.org> 于2019年1月18日周五 上午11:47写道:
>
> > There are a few related topics I'd like to discuss and I figured this
> > subject line is the most likely to get a bit of attention. :)
> >
> > First, I'd like us all to get on the same page wrt the current state
> > of branch-2. Personally, I don't think it can be released as-is with a
> > 2.y version because folks can't rolling upgrade from 2.0 or 2.1 to it
> > due to the current implementation of HBASE-20881. As Duo has mentioned
> > a couple of times, folks have to ensure there are no region
> > transitions around during the upgrade. I think that will be
> > prohibitive for folks looking to upgrade. What do other folks think?
> >
> > Second, I think our recent discussions around the need for shifting to
> > more minor releases for HBase 1.y also applies to the 2.y branches.
> > branch-2 hasn't had a release since 2.1.0 came out in July 2018.
> > That's a scary long amount of time. I think it contributes to us
> > ending up with changes like the above since it's easy to think about
> > the branch as something that has a lot of time before the next
> > release.
> >
> > Personally, I'd like to see us skip making minor-release specific
> > branches for a bit unless a CVE fix or something comes up. Ideally,
> > that would mean we work towards a 2.2.0 release directly from branch-2
> > and then 2.2.1, etc. When we have a feature that's ready to backport
> > from the master branch for a release we then update branch-2's version
> > to be 2.3.0.
> >
> > Or maybe we try set a regular cadence to feature releases by having
> > branch-2 release a new minor, two months of new maintenance releases,
> > followed by a new minor. That would mean after the last of the
> > maintenance releases we'd have a window of a few weeks where we can
> > all decide which features in master are mature enough to backport for
> > the new minor release.
> >
> > Lastly, what would it take for folks to feel confident moving the
> > 'stable' pointer to a HBase 2.y? Is there a major gap still on
> > assignment stability? Is it a more thorough look at performance? More
> > time to ensure HBCK2 has good coverage of failure modes that need it?
> >
>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk

Re: [DISCUSS] Moving towards a branch-2 line that can get the 'stable' pointer.

Posted by "张铎 (Duo Zhang)" <pa...@gmail.com>.
There are still lots of small new features which we want to integrate into
branch-2 so I'm -1 on making release directly from branch-2. Backporting at
once before release is a pain I'd say, I've tried this many times recently,
as we have to follow up the community version...Let's make a branch-2.2
when we want to release 2.2.0, and maybe also retire the branch-2.0?

For the stable pointer, I think 2.1.x maybe a good candidate? Though we
know that we may still have some bugs for the AMv2, but actually we all
know that the AMv1 for all the branch-1.x also has lots of bugs, that's why
hbck is very important.

And also +! on making progress on HBCK2, we need to port he useful features
of HBCK1 to HBCK2. There is no software can guarantee that there is no bug,
so FWIW we should have a way to fix broken clusters.

Sean Busbey <bu...@apache.org> 于2019年1月18日周五 上午11:47写道:

> There are a few related topics I'd like to discuss and I figured this
> subject line is the most likely to get a bit of attention. :)
>
> First, I'd like us all to get on the same page wrt the current state
> of branch-2. Personally, I don't think it can be released as-is with a
> 2.y version because folks can't rolling upgrade from 2.0 or 2.1 to it
> due to the current implementation of HBASE-20881. As Duo has mentioned
> a couple of times, folks have to ensure there are no region
> transitions around during the upgrade. I think that will be
> prohibitive for folks looking to upgrade. What do other folks think?
>
> Second, I think our recent discussions around the need for shifting to
> more minor releases for HBase 1.y also applies to the 2.y branches.
> branch-2 hasn't had a release since 2.1.0 came out in July 2018.
> That's a scary long amount of time. I think it contributes to us
> ending up with changes like the above since it's easy to think about
> the branch as something that has a lot of time before the next
> release.
>
> Personally, I'd like to see us skip making minor-release specific
> branches for a bit unless a CVE fix or something comes up. Ideally,
> that would mean we work towards a 2.2.0 release directly from branch-2
> and then 2.2.1, etc. When we have a feature that's ready to backport
> from the master branch for a release we then update branch-2's version
> to be 2.3.0.
>
> Or maybe we try set a regular cadence to feature releases by having
> branch-2 release a new minor, two months of new maintenance releases,
> followed by a new minor. That would mean after the last of the
> maintenance releases we'd have a window of a few weeks where we can
> all decide which features in master are mature enough to backport for
> the new minor release.
>
> Lastly, what would it take for folks to feel confident moving the
> 'stable' pointer to a HBase 2.y? Is there a major gap still on
> assignment stability? Is it a more thorough look at performance? More
> time to ensure HBCK2 has good coverage of failure modes that need it?
>