You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Reid Chan <re...@outlook.com> on 2020/10/15 08:28:25 UTC

Re: Release 1.7.0 (for real this time)

About HBASE-25083 and online slowlog, need any help?



--------------------------

Best regards,
R.C



________________________________________
From: Andrew Purtell <ap...@apache.org>
Sent: 28 September 2020 23:02
To: dev
Subject: Re: Release 1.7.0 (for real this time)

Thank you so much Reid.

I will take a look again this week. Bharath landed the master registry
work. Viraj's backport of the online slowlog is also about to be approved
and merged (by me I think). Now that these are in it's time for another
pass of unit test checking, then we can proceed with spinning the RC.

TestClientOperationInterrupt can be temporarily disabled with a JIRA filed
to re-enable if it's going to take too much time to fix right away.

Before spinning the RC I like to generate a report with the compatibility
checker. There will be some changes flagged in the report this time. We
typically allow changes that are binary compatible per the Java guidelines:
https://docs.oracle.com/javase/specs/jls/se8/html/jls-13.html . Our own
compatibility guidelines are written to be flexible. For each exception we
can consider the changes case by case and decide if the changes as-is are
fine, or if additional backwards compatibility work is needed, or if a
change must be partially or completely reverted.

Once the compatibility situation is settled it would be time to spin the RC
bits using make_rc.sh. If you want to do that, fine by me, or I can do it.



On Mon, Sep 28, 2020 at 4:10 AM Reid Chan <re...@outlook.com> wrote:

> TestRestartCluster (IO error on shutdown) is fixed in HBASE-25030
> TestReplicationDisableInactivePeer (testDisableInactivePeer) is fixed in
> HBASE-25031
> TestFromClientSide (testCheckAndDeleteWithCompareOp) &
> TestFromClientSideWithCoprocessor (testCheckAndDeleteWithCompareOp) are
> fixed in HBASE-25025. But after HBASE-25025,
> TestFromClientSide#testCacheOnWriteEvictOnClose surfaces, 100% wrong, I
> will take a look at it.
> TestClientOperationInterrupt (testInterrupt50Percent). Filed HBASE-25024,
> but this test is hard to fix, if we have to make sure 50% threads get
> interrupted.
> TestReplicationSmallTests (testHBase14905), it does fail occasionally,
> pending.
>
> TestClusterPortAssignment
> TestHCM
> TestExecutorService
> TestZKLessAMOnCluster
> TestRSGroupsKillRS
> Looks good on my local tests, Let me try few more times.
>
> All tests were ran on both centos server and my local laptop(mac), for
> making sure it fixed or indeed flaky.
>
>
>
>
> --------------------------
>
> Best regards,
> R.C
>
>
>
> ________________________________________
> From: Andrew Purtell <ap...@apache.org>
> Sent: 09 September 2020 00:20
> To: dev
> Subject: Re: Release 1.7.0 (for real this time)
>
> Speaking of known flakes, if you have a moment could you please file JIRAs
> and mark the fix version at least as 1.7.0, for visibility?
>
> Last week I looped the test suite 100 times and found these, so you can
> ignore them as I'll take care of this for them:
>
>    - TestFromClientSide (testCheckAndDeleteWithCompareOp)
>    - TestFromClientSideWithCoprocessor (testCheckAndDeleteWithCompareOp)
>    - TestClientOperationInterrupt (testInterrupt50Percent)
>    - TestReplicationSmallTests (testHBase14905)
>    - TestReplicationDisableInactivePeer (testDisableInactivePeer)
>    - TestRestartCluster (IO error on shutdown)
>
> From
>
> https://ci-hadoop.apache.org/job/HBase/job/HBase-Find-Flaky-Tests/job/branch-1/
> there are these too:
>
>    - TestClusterPortAssignment
>    - TestHCM
>    - TestExecutorService
>    - TestZKLessAMOnCluster
>    - TestRSGroupsKillRS
>
>
> On Tue, Sep 8, 2020 at 8:50 AM Bharath Vissapragada <bh...@apache.org>
> wrote:
>
> > Master registry backport work <https://github.com/apache/hbase/pull/2280
> >
> > for branch-1 is ready for review if anyone wants to take a look. Changes
> > are a bit more involved than branch-2/master due to code divergence,
> Java-7
> > compatibility etc. All the tests seem to be passing (except a couple of
> > flakes which are known issues in other branches). Given the size of the
> PR,
> > I'm happy to break it in smaller pieces if needed.
> >
> > On Thu, Aug 20, 2020 at 9:43 AM Andrew Purtell <ap...@apache.org>
> > wrote:
> >
> > > Updates:
> > >
> > > Reid and I will start prerelease work, like unit test hygiene and
> > > prequalification with ITBLL. This will probably take a week or two.
> > >
> > > Viraj is currently working on a backport of the named queue facility
> and
> > > online slow log and balancer decision log based on it. It might have
> time
> > > to get in.When it's ready for potential commit we can decide how much
> > > prequalification work would be invalidated by more changes to branch-1
> at
> > > that time.
> > >
> > > Bharath mentioned to me he's currently working on a backport of the
> > master
> > > registry for configuration. This might also have time to get in. When
> > it's
> > > ready for potential commit we can decide how much prequalification work
> > > would be invalidated by more changes to branch-1 at that time.
> > >
> > > On Fri, Aug 14, 2020 at 10:50 AM Andrew Purtell <ap...@apache.org>
> > > wrote:
> > >
> > > > Next week work on release 1.7.0 will begin.
> > > >
> > > > It doesn't look like much beyond ad hoc backporting and operationally
> > > > focused bugfixes have been happening in branch-1 for a while, which
> is
> > > > good, and there's no reason not to continue this activity while RC
> work
> > > is
> > > > in progress.
> > > >
> > > > If you do have any branch-1 targeted work pending, please consider
> > > > committing it in the next week or so. When we have most pending work
> > > > flushed any test stabilization effort will be more likely to succeed.
> > > >
> > > > Thanks for your attention and consideration.
> >
> >

Re: Release 1.7.0 (for real this time)

Posted by Andrew Purtell <ap...@apache.org>.
I am not sure of the status of online slowlog. Have all the backports been
committed? If not yes help would be appreciated to finish the PR reviews
etc.

HBASE-25083 is a simple thing we can do at the last minute, no worries
there.


On Thu, Oct 15, 2020 at 1:28 AM Reid Chan <re...@outlook.com> wrote:

>
> About HBASE-25083 and online slowlog, need any help?
>
>
>
> --------------------------
>
> Best regards,
> R.C
>
>
>
> ________________________________________
> From: Andrew Purtell <ap...@apache.org>
> Sent: 28 September 2020 23:02
> To: dev
> Subject: Re: Release 1.7.0 (for real this time)
>
> Thank you so much Reid.
>
> I will take a look again this week. Bharath landed the master registry
> work. Viraj's backport of the online slowlog is also about to be approved
> and merged (by me I think). Now that these are in it's time for another
> pass of unit test checking, then we can proceed with spinning the RC.
>
> TestClientOperationInterrupt can be temporarily disabled with a JIRA filed
> to re-enable if it's going to take too much time to fix right away.
>
> Before spinning the RC I like to generate a report with the compatibility
> checker. There will be some changes flagged in the report this time. We
> typically allow changes that are binary compatible per the Java guidelines:
> https://docs.oracle.com/javase/specs/jls/se8/html/jls-13.html . Our own
> compatibility guidelines are written to be flexible. For each exception we
> can consider the changes case by case and decide if the changes as-is are
> fine, or if additional backwards compatibility work is needed, or if a
> change must be partially or completely reverted.
>
> Once the compatibility situation is settled it would be time to spin the RC
> bits using make_rc.sh. If you want to do that, fine by me, or I can do it.
>
>
>
> On Mon, Sep 28, 2020 at 4:10 AM Reid Chan <re...@outlook.com> wrote:
>
> > TestRestartCluster (IO error on shutdown) is fixed in HBASE-25030
> > TestReplicationDisableInactivePeer (testDisableInactivePeer) is fixed in
> > HBASE-25031
> > TestFromClientSide (testCheckAndDeleteWithCompareOp) &
> > TestFromClientSideWithCoprocessor (testCheckAndDeleteWithCompareOp) are
> > fixed in HBASE-25025. But after HBASE-25025,
> > TestFromClientSide#testCacheOnWriteEvictOnClose surfaces, 100% wrong, I
> > will take a look at it.
> > TestClientOperationInterrupt (testInterrupt50Percent). Filed HBASE-25024,
> > but this test is hard to fix, if we have to make sure 50% threads get
> > interrupted.
> > TestReplicationSmallTests (testHBase14905), it does fail occasionally,
> > pending.
> >
> > TestClusterPortAssignment
> > TestHCM
> > TestExecutorService
> > TestZKLessAMOnCluster
> > TestRSGroupsKillRS
> > Looks good on my local tests, Let me try few more times.
> >
> > All tests were ran on both centos server and my local laptop(mac), for
> > making sure it fixed or indeed flaky.
> >
> >
> >
> >
> > --------------------------
> >
> > Best regards,
> > R.C
> >
> >
> >
> > ________________________________________
> > From: Andrew Purtell <ap...@apache.org>
> > Sent: 09 September 2020 00:20
> > To: dev
> > Subject: Re: Release 1.7.0 (for real this time)
> >
> > Speaking of known flakes, if you have a moment could you please file
> JIRAs
> > and mark the fix version at least as 1.7.0, for visibility?
> >
> > Last week I looped the test suite 100 times and found these, so you can
> > ignore them as I'll take care of this for them:
> >
> >    - TestFromClientSide (testCheckAndDeleteWithCompareOp)
> >    - TestFromClientSideWithCoprocessor (testCheckAndDeleteWithCompareOp)
> >    - TestClientOperationInterrupt (testInterrupt50Percent)
> >    - TestReplicationSmallTests (testHBase14905)
> >    - TestReplicationDisableInactivePeer (testDisableInactivePeer)
> >    - TestRestartCluster (IO error on shutdown)
> >
> > From
> >
> >
> https://ci-hadoop.apache.org/job/HBase/job/HBase-Find-Flaky-Tests/job/branch-1/
> > there are these too:
> >
> >    - TestClusterPortAssignment
> >    - TestHCM
> >    - TestExecutorService
> >    - TestZKLessAMOnCluster
> >    - TestRSGroupsKillRS
> >
> >
> > On Tue, Sep 8, 2020 at 8:50 AM Bharath Vissapragada <bharathv@apache.org
> >
> > wrote:
> >
> > > Master registry backport work <
> https://github.com/apache/hbase/pull/2280
> > >
> > > for branch-1 is ready for review if anyone wants to take a look.
> Changes
> > > are a bit more involved than branch-2/master due to code divergence,
> > Java-7
> > > compatibility etc. All the tests seem to be passing (except a couple of
> > > flakes which are known issues in other branches). Given the size of the
> > PR,
> > > I'm happy to break it in smaller pieces if needed.
> > >
> > > On Thu, Aug 20, 2020 at 9:43 AM Andrew Purtell <ap...@apache.org>
> > > wrote:
> > >
> > > > Updates:
> > > >
> > > > Reid and I will start prerelease work, like unit test hygiene and
> > > > prequalification with ITBLL. This will probably take a week or two.
> > > >
> > > > Viraj is currently working on a backport of the named queue facility
> > and
> > > > online slow log and balancer decision log based on it. It might have
> > time
> > > > to get in.When it's ready for potential commit we can decide how much
> > > > prequalification work would be invalidated by more changes to
> branch-1
> > at
> > > > that time.
> > > >
> > > > Bharath mentioned to me he's currently working on a backport of the
> > > master
> > > > registry for configuration. This might also have time to get in. When
> > > it's
> > > > ready for potential commit we can decide how much prequalification
> work
> > > > would be invalidated by more changes to branch-1 at that time.
> > > >
> > > > On Fri, Aug 14, 2020 at 10:50 AM Andrew Purtell <apurtell@apache.org
> >
> > > > wrote:
> > > >
> > > > > Next week work on release 1.7.0 will begin.
> > > > >
> > > > > It doesn't look like much beyond ad hoc backporting and
> operationally
> > > > > focused bugfixes have been happening in branch-1 for a while, which
> > is
> > > > > good, and there's no reason not to continue this activity while RC
> > work
> > > > is
> > > > > in progress.
> > > > >
> > > > > If you do have any branch-1 targeted work pending, please consider
> > > > > committing it in the next week or so. When we have most pending
> work
> > > > > flushed any test stabilization effort will be more likely to
> succeed.
> > > > >
> > > > > Thanks for your attention and consideration.
> > >
> > >
>


-- 
Best regards,
Andrew

Words like orphans lost among the crosstalk, meaning torn from truth's
decrepit hands
   - A23, Crosstalk