You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "larsh@apache.org" <la...@apache.org> on 2021/05/26 02:19:12 UTC

Time for 5.1.2?

There are 17 resolved issues against it.
PHOENIX-6436 is needed for further integration with Trino (formerly Presto).

It is time for another point release?

-- Lars

Re: Time for 5.1.2?

Posted by Andrew Purtell <an...@gmail.com>.
The design is a product of its time and is due for an update. It would be better if the admin APIs triggered and monitored a procedure that established a barrier at the RS cache update, so a grant or revoke becomes a synchronous operation and the client knows when the call returns the propagation of the updates are complete. However this is open source and we are reliant on volunteers to do the work, and if it isn’t critical to someone or easy to do it is unlikely to happen soon.


> On May 31, 2021, at 10:02 AM, Viraj Jasani <vj...@apache.org> wrote:
> 
> 
>> 
>> This stuff was designed in years before the Procedure framework was
> written. What we had to work with is ZK watchers.
> 
> I see, this aligns with what I got to know today about the workflow. Yes,
> w.r.t ZK watchers, there is no change b/ HBase 1 and 2. Thanks Andrew for
> the clarification!
> 
>> The propagation of cache updates could be made synchronous with the
> update itself if the grant and revoke admin apis were refactored as
> Procedures. The Procedure would have a sub-procedure that establishes a
> barrier around the cache refresh at all live regionservers, presumably.
> There is a JIRA for that work filed somewhere, an old one.
> 
> Yes, I saw the Jira as well today, HBASE-21602. Good to see significant no
> of sub-tasks are already completed.
> But overall I understand that since this is more of an operator initiated
> action, I don't think there is a problem with existing design. The only
> thing I wanted to make sure, HBase 1 and 2 don't have any behavioural
> change in the workflow (other than HBase 1 using coproc endpoint directly
> on RS vs HBase 2 using Admin APIs).
> 
>> On Mon, May 31, 2021 at 10:07 PM Andrew Purtell <an...@gmail.com>
>> wrote:
>> 
>> In the HBase security unit tests we wait for the zk mediated cache reload
>> to complete before continuing the test. If you don’t do that your test will
>> be flaky. Check that you are doing this. If you are not, then your tests
>> will be flaky. In the real world, the operator issues the grant or revoke
>> command and we have the human time scale of ~seconds (at least) for the
>> change to propagate. This is not a bug. This stuff was designed in years
>> before the Procedure framework was written. What we had to work with is ZK
>> watchers. Those are processed asynchronously with respect to the admin API.
>> Because a unit test runs much faster than a human administrative act, you
>> have to wait for more than just the admin API to complete. The cache in
>> each regionserver’s access controller has a last update timestamp and you
>> should wait for them to all update. Look at the HBase unit tests to see how
>> that is done.
>> 
>> The propagation of cache updates could be made synchronous with the update
>> itself if the grant and revoke admin apis were refactored as Procedures.
>> The Procedure would have a sub-procedure that establishes a barrier around
>> the cache refresh at all live regionservers, presumably. There is a JIRA
>> for that work filed somewhere, an old one. It’s not been important enough
>> for someone to pick up and perform. Don’t expect this to change.
>> 
>> 
>>>> On May 31, 2021, at 6:52 AM, Viraj Jasani <vj...@apache.org> wrote:
>>> 
>>> For Permission APIs grant/revoke, I just checked the diff b/ HBase 1 and
>>> HBase 2.
>>> Phoenix MetaDataClient uses AccessControlClient to grant/revoke
>> permission
>>> to a user. HBase 1 AccessControlClient invokes coproc
>>> endpoint AccessController, which puts records in hbase:acl table on
>>> RegionServer hosting acl's single region. HBase 2 AccessControlClient on
>>> the other hand invokes Admin APIs to let master take care of adding
>> records
>>> in hbase:acl table (HBASE-21739). This is the main diff b/ both versions,
>>> however, both workflows seem synchronous.
>>> For both HBase 1 and 2, post-put hook updates ZNodes with updated entries
>>> in acl table, which internally triggers a refresh of table cache in
>>> AuthManager, and this table cache could be used to verify Auth of user on
>>> table/namespace in subsequent request (based on our test case: remove
>>> access, followed by verification of grant permission). If table cache has
>>> race condition, it should also be applicable to HBase 1 as well I
>> believe,
>>> yet to compare both at length.
>>> 
>>> I understand, Phoenix is not doing anything different w.r.t GRANT/REVOKE
>>> statements so some sync issue might have been introduced in HBase 2
>>> unknowingly. I will try to dig more sometime next week and file HBase
>> Jira
>>> if I could find some evidence of race conditions or could reproduce with
>>> HBase tests directly.
>>> 
>>>> On Mon, May 31, 2021 at 11:40 AM Istvan Toth <st...@apache.org> wrote:
>>>> 
>>>> I agree that we are ready for the RC, and that deflaking the tests is
>>>> great.
>>>> 
>>>> However, I think that Permissions test failures are exposing a real
>> problem
>>>> (probably in HBase),
>>>> and we've only adopted the tests for the buggy behaviour of HBase, and
>>>> should track down / fix
>>>> the root cause.
>>>> 
>>>> On Sun, May 30, 2021 at 4:46 PM Viraj Jasani <vj...@apache.org>
>> wrote:
>>>> 
>>>>>> IMHO, a flapper is worse than a failing test because it undermines the
>>>>> confidence in a test run.
>>>>> 
>>>>> I concur. In fact, in the last few builds, I have seen a few flappers
>>>> (e.g
>>>>> testUpsertSelectWithMultiByteCharsAutoCommit, testAsyncRebuildAll)
>> whose
>>>>> resolution was attempted just a few months back and now they are
>> showing
>>>> up
>>>>> again (with much lower frequency than before), so yes we need some
>>>>> attention again.
>>>>> 
>>>>> After fixing consistent failure with testPherfMain (PHOENIX-6482) and
>>>> high
>>>>> frequency flappers with BasePermissionsIT and AuditLoggingIT
>>>>> (PHOENIX-6483), CI builds are in better shape now.
>>>>> In the recent build, I can see no test failures with 3/4 HBase
>> profiles.
>>>> I
>>>>> will file a Jira for JaCoCo report generation failure, which keeps
>>>> showing
>>>>> up often (flapper with JaCoCo issue, not related to test failure).
>>>>> 
>>>>> Given that CI builds are in better shape, planning to start an RC soon.
>>>>> Thanks
>>>>> 
>>>>> 
>>>>> On Sat, May 29, 2021 at 9:45 PM larsh@apache.org <la...@apache.org>
>>>> wrote:
>>>>> 
>>>>>> Nice!
>>>>>> 
>>>>>> I noticed we had some successful 5.1 runs too.
>>>>>> Fixing the tests would need some attention again. Flappers should
>>>> either
>>>>>> be fixed or disabled.
>>>>>> 
>>>>>> IMHO, a flapper is worse than a failing test because it undermines the
>>>>>> confidence in a test run.
>>>>>> If the tests always pass you'll notice a failure right away.
>>>>>> But when most of the runs produce some kind of failure it is human
>>>> nature
>>>>>> to start ignoring them, and then new, real failures pile up.
>>>>>> 
>>>>>> 
>>>>>> -- Lars
>>>>>> 
>>>>>> 
>>>>>> On Friday, May 28, 2021, 11:36:53 PM PDT, Viraj Jasani <
>>>>> vjasani@apache.org>
>>>>>> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Due to incorrect/missing fix versions, found some discrepancies b/
>>>> master
>>>>>> and 5.1 in the Pherf module.
>>>>>> Backported recent improvements to 5.1 including test fix:
>>>>>> PHOENIX-6417, PHOENIX-6118, PHOENIX-6430, PHOENIX-6429, PHOENIX-6431,
>>>>>> PHOENIX-6432
>>>>>> 
>>>>>> On Sat, May 29, 2021 at 12:20 AM Viraj Jasani <vj...@apache.org>
>>>>> wrote:
>>>>>> 
>>>>>>> I looked at CI build today and there are a couple of flakies in
>>>>>> core/pherf.
>>>>>>> 
>>>>>>> 3 major flaky tests: PermissionNSEnabledIT, PermissionsCacheIT,
>>>>>>> AuditLoggingIT (reported with HBase 2.2/2.3/2.4). Permission tests
>>>> have
>>>>>>> been flaky for quite some time. IIRC, there was some attempt to fix
>>>>> them
>>>>>> as
>>>>>>> well but still they appear with AccessDeniedException at times. Not
>>>>> much
>>>>>>> sure about AuditLoggingIT though.
>>>>>>> 
>>>>>>> Pherf test fails with HBase 2.1.
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, 28 May 2021 at 10:44 PM, larsh@apache.org <la...@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> 5.1 CI builds seems to fail consistently - at least for the past two
>>>>>>>> weeks.
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-mulitbranch/job/5.1/
>>>>>>>> 
>>>>>>>> Might be something to look into.
>>>>>>>> I have an hour today between meetings, I'll see what I can do. :)
>>>>>>>> 
>>>>>>>> -- Lars
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wednesday, May 26, 2021, 10:54:13 PM PDT, Viraj Jasani <
>>>>>>>> vjasani@apache.org> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thanks Lars. I had an offline chat with Istvan yesterday and I will
>>>>>>>> prepare
>>>>>>>> RCs over the weekend.
>>>>>>>> 
>>>>>>>> In the meantime, I will again take a look at Jira fixVersion and git
>>>>>>>> commits compatibility and also try to help with any backport (if
>>>> still
>>>>>>>> pending).
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, 26 May 2021 at 11:53 PM, larsh@apache.org <larsh@apache.org
>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I cherry-picked three changes into branch 5.1.
>>>>>>>>> Assuming Jira is up-to-date, there are no 4.16.x issues that are
>>>> not
>>>>>>>> also
>>>>>>>>> in 5.1.x.
>>>>>>>>> 
>>>>>>>>> There are 12 issues left that are in 4.17.0 and 5.2.0, but not in
>>>>>> 5.1.x.
>>>>>>>>> But they are presumably not in 4.16.x because they violate
>>>> backward
>>>>>>>>> compatibility, and - if so - should not be in 5.1.2.
>>>>>>>>> 
>>>>>>>>> -- Lars
>>>>>>>>> 
>>>>>>>>> On Wednesday, May 26, 2021, 10:15:31 AM PDT, larsh@apache.org <
>>>>>>>>> larsh@apache.org> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Awesome. Thanks Istvan.
>>>>>>>>> 
>>>>>>>>> I'll do a pass through the issues too.
>>>>>>>>> 
>>>>>>>>> -- Lars
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tuesday, May 25, 2021, 8:45:00 PM PDT, Istvan Toth <
>>>>>> stoty@apache.org
>>>>>>>>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I am not aware of any blockers in 5.1.
>>>>>>>>> If there is no objection, and no other volunteers for the 5.1.2 RM
>>>>>>>> role, I
>>>>>>>>> intend to start the process on Monday.
>>>>>>>>> 
>>>>>>>>> I will also look for missing fixes before release, but if anyone
>>>> is
>>>>>>>> aware
>>>>>>>>> of any fix that is needed in 5.1.2 , but
>>>>>>>>> hasn't been backported, then please try to get it resolved by
>>>>> Monday.
>>>>>>>>> 
>>>>>>>>> regards
>>>>>>>>> Istvan
>>>>>>>>> 
>>>>>>>>> On Wed, May 26, 2021 at 4:19 AM larsh@apache.org <
>>>> larsh@apache.org>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> There are 17 resolved issues against it.
>>>>>>>>>> PHOENIX-6436 is needed for further integration with Trino
>>>>> (formerly
>>>>>>>>>> Presto).
>>>>>>>>>> 
>>>>>>>>>> It is time for another point release?
>>>>>>>>>> 
>>>>>>>>>> -- Lars
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 

Re: Time for 5.1.2?

Posted by Viraj Jasani <vj...@apache.org>.
> This stuff was designed in years before the Procedure framework was
written. What we had to work with is ZK watchers.

I see, this aligns with what I got to know today about the workflow. Yes,
w.r.t ZK watchers, there is no change b/ HBase 1 and 2. Thanks Andrew for
the clarification!

> The propagation of cache updates could be made synchronous with the
update itself if the grant and revoke admin apis were refactored as
Procedures. The Procedure would have a sub-procedure that establishes a
barrier around the cache refresh at all live regionservers, presumably.
There is a JIRA for that work filed somewhere, an old one.

Yes, I saw the Jira as well today, HBASE-21602. Good to see significant no
of sub-tasks are already completed.
But overall I understand that since this is more of an operator initiated
action, I don't think there is a problem with existing design. The only
thing I wanted to make sure, HBase 1 and 2 don't have any behavioural
change in the workflow (other than HBase 1 using coproc endpoint directly
on RS vs HBase 2 using Admin APIs).

On Mon, May 31, 2021 at 10:07 PM Andrew Purtell <an...@gmail.com>
wrote:

> In the HBase security unit tests we wait for the zk mediated cache reload
> to complete before continuing the test. If you don’t do that your test will
> be flaky. Check that you are doing this. If you are not, then your tests
> will be flaky. In the real world, the operator issues the grant or revoke
> command and we have the human time scale of ~seconds (at least) for the
> change to propagate. This is not a bug. This stuff was designed in years
> before the Procedure framework was written. What we had to work with is ZK
> watchers. Those are processed asynchronously with respect to the admin API.
> Because a unit test runs much faster than a human administrative act, you
> have to wait for more than just the admin API to complete. The cache in
> each regionserver’s access controller has a last update timestamp and you
> should wait for them to all update. Look at the HBase unit tests to see how
> that is done.
>
> The propagation of cache updates could be made synchronous with the update
> itself if the grant and revoke admin apis were refactored as Procedures.
> The Procedure would have a sub-procedure that establishes a barrier around
> the cache refresh at all live regionservers, presumably. There is a JIRA
> for that work filed somewhere, an old one. It’s not been important enough
> for someone to pick up and perform. Don’t expect this to change.
>
>
> > On May 31, 2021, at 6:52 AM, Viraj Jasani <vj...@apache.org> wrote:
> >
> > For Permission APIs grant/revoke, I just checked the diff b/ HBase 1 and
> > HBase 2.
> > Phoenix MetaDataClient uses AccessControlClient to grant/revoke
> permission
> > to a user. HBase 1 AccessControlClient invokes coproc
> > endpoint AccessController, which puts records in hbase:acl table on
> > RegionServer hosting acl's single region. HBase 2 AccessControlClient on
> > the other hand invokes Admin APIs to let master take care of adding
> records
> > in hbase:acl table (HBASE-21739). This is the main diff b/ both versions,
> > however, both workflows seem synchronous.
> > For both HBase 1 and 2, post-put hook updates ZNodes with updated entries
> > in acl table, which internally triggers a refresh of table cache in
> > AuthManager, and this table cache could be used to verify Auth of user on
> > table/namespace in subsequent request (based on our test case: remove
> > access, followed by verification of grant permission). If table cache has
> > race condition, it should also be applicable to HBase 1 as well I
> believe,
> > yet to compare both at length.
> >
> > I understand, Phoenix is not doing anything different w.r.t GRANT/REVOKE
> > statements so some sync issue might have been introduced in HBase 2
> > unknowingly. I will try to dig more sometime next week and file HBase
> Jira
> > if I could find some evidence of race conditions or could reproduce with
> > HBase tests directly.
> >
> >> On Mon, May 31, 2021 at 11:40 AM Istvan Toth <st...@apache.org> wrote:
> >>
> >> I agree that we are ready for the RC, and that deflaking the tests is
> >> great.
> >>
> >> However, I think that Permissions test failures are exposing a real
> problem
> >> (probably in HBase),
> >> and we've only adopted the tests for the buggy behaviour of HBase, and
> >> should track down / fix
> >> the root cause.
> >>
> >> On Sun, May 30, 2021 at 4:46 PM Viraj Jasani <vj...@apache.org>
> wrote:
> >>
> >>>> IMHO, a flapper is worse than a failing test because it undermines the
> >>> confidence in a test run.
> >>>
> >>> I concur. In fact, in the last few builds, I have seen a few flappers
> >> (e.g
> >>> testUpsertSelectWithMultiByteCharsAutoCommit, testAsyncRebuildAll)
> whose
> >>> resolution was attempted just a few months back and now they are
> showing
> >> up
> >>> again (with much lower frequency than before), so yes we need some
> >>> attention again.
> >>>
> >>> After fixing consistent failure with testPherfMain (PHOENIX-6482) and
> >> high
> >>> frequency flappers with BasePermissionsIT and AuditLoggingIT
> >>> (PHOENIX-6483), CI builds are in better shape now.
> >>> In the recent build, I can see no test failures with 3/4 HBase
> profiles.
> >> I
> >>> will file a Jira for JaCoCo report generation failure, which keeps
> >> showing
> >>> up often (flapper with JaCoCo issue, not related to test failure).
> >>>
> >>> Given that CI builds are in better shape, planning to start an RC soon.
> >>> Thanks
> >>>
> >>>
> >>> On Sat, May 29, 2021 at 9:45 PM larsh@apache.org <la...@apache.org>
> >> wrote:
> >>>
> >>>> Nice!
> >>>>
> >>>> I noticed we had some successful 5.1 runs too.
> >>>> Fixing the tests would need some attention again. Flappers should
> >> either
> >>>> be fixed or disabled.
> >>>>
> >>>> IMHO, a flapper is worse than a failing test because it undermines the
> >>>> confidence in a test run.
> >>>> If the tests always pass you'll notice a failure right away.
> >>>> But when most of the runs produce some kind of failure it is human
> >> nature
> >>>> to start ignoring them, and then new, real failures pile up.
> >>>>
> >>>>
> >>>> -- Lars
> >>>>
> >>>>
> >>>> On Friday, May 28, 2021, 11:36:53 PM PDT, Viraj Jasani <
> >>> vjasani@apache.org>
> >>>> wrote:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Due to incorrect/missing fix versions, found some discrepancies b/
> >> master
> >>>> and 5.1 in the Pherf module.
> >>>> Backported recent improvements to 5.1 including test fix:
> >>>> PHOENIX-6417, PHOENIX-6118, PHOENIX-6430, PHOENIX-6429, PHOENIX-6431,
> >>>> PHOENIX-6432
> >>>>
> >>>> On Sat, May 29, 2021 at 12:20 AM Viraj Jasani <vj...@apache.org>
> >>> wrote:
> >>>>
> >>>>> I looked at CI build today and there are a couple of flakies in
> >>>> core/pherf.
> >>>>>
> >>>>> 3 major flaky tests: PermissionNSEnabledIT, PermissionsCacheIT,
> >>>>> AuditLoggingIT (reported with HBase 2.2/2.3/2.4). Permission tests
> >> have
> >>>>> been flaky for quite some time. IIRC, there was some attempt to fix
> >>> them
> >>>> as
> >>>>> well but still they appear with AccessDeniedException at times. Not
> >>> much
> >>>>> sure about AuditLoggingIT though.
> >>>>>
> >>>>> Pherf test fails with HBase 2.1.
> >>>>>
> >>>>>
> >>>>> On Fri, 28 May 2021 at 10:44 PM, larsh@apache.org <la...@apache.org>
> >>>>> wrote:
> >>>>>
> >>>>>> 5.1 CI builds seems to fail consistently - at least for the past two
> >>>>>> weeks.
> >>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-mulitbranch/job/5.1/
> >>>>>>
> >>>>>> Might be something to look into.
> >>>>>> I have an hour today between meetings, I'll see what I can do. :)
> >>>>>>
> >>>>>> -- Lars
> >>>>>>
> >>>>>>
> >>>>>> On Wednesday, May 26, 2021, 10:54:13 PM PDT, Viraj Jasani <
> >>>>>> vjasani@apache.org> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Thanks Lars. I had an offline chat with Istvan yesterday and I will
> >>>>>> prepare
> >>>>>> RCs over the weekend.
> >>>>>>
> >>>>>> In the meantime, I will again take a look at Jira fixVersion and git
> >>>>>> commits compatibility and also try to help with any backport (if
> >> still
> >>>>>> pending).
> >>>>>>
> >>>>>>
> >>>>>> On Wed, 26 May 2021 at 11:53 PM, larsh@apache.org <larsh@apache.org
> >>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> I cherry-picked three changes into branch 5.1.
> >>>>>>> Assuming Jira is up-to-date, there are no 4.16.x issues that are
> >> not
> >>>>>> also
> >>>>>>> in 5.1.x.
> >>>>>>>
> >>>>>>> There are 12 issues left that are in 4.17.0 and 5.2.0, but not in
> >>>> 5.1.x.
> >>>>>>> But they are presumably not in 4.16.x because they violate
> >> backward
> >>>>>>> compatibility, and - if so - should not be in 5.1.2.
> >>>>>>>
> >>>>>>> -- Lars
> >>>>>>>
> >>>>>>> On Wednesday, May 26, 2021, 10:15:31 AM PDT, larsh@apache.org <
> >>>>>>> larsh@apache.org> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Awesome. Thanks Istvan.
> >>>>>>>
> >>>>>>> I'll do a pass through the issues too.
> >>>>>>>
> >>>>>>> -- Lars
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tuesday, May 25, 2021, 8:45:00 PM PDT, Istvan Toth <
> >>>> stoty@apache.org
> >>>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> I am not aware of any blockers in 5.1.
> >>>>>>> If there is no objection, and no other volunteers for the 5.1.2 RM
> >>>>>> role, I
> >>>>>>> intend to start the process on Monday.
> >>>>>>>
> >>>>>>> I will also look for missing fixes before release, but if anyone
> >> is
> >>>>>> aware
> >>>>>>> of any fix that is needed in 5.1.2 , but
> >>>>>>> hasn't been backported, then please try to get it resolved by
> >>> Monday.
> >>>>>>>
> >>>>>>> regards
> >>>>>>> Istvan
> >>>>>>>
> >>>>>>> On Wed, May 26, 2021 at 4:19 AM larsh@apache.org <
> >> larsh@apache.org>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> There are 17 resolved issues against it.
> >>>>>>>> PHOENIX-6436 is needed for further integration with Trino
> >>> (formerly
> >>>>>>>> Presto).
> >>>>>>>>
> >>>>>>>> It is time for another point release?
> >>>>>>>>
> >>>>>>>> -- Lars
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Re: Time for 5.1.2?

Posted by Andrew Purtell <an...@gmail.com>.
In the HBase security unit tests we wait for the zk mediated cache reload to complete before continuing the test. If you don’t do that your test will be flaky. Check that you are doing this. If you are not, then your tests will be flaky. In the real world, the operator issues the grant or revoke command and we have the human time scale of ~seconds (at least) for the change to propagate. This is not a bug. This stuff was designed in years before the Procedure framework was written. What we had to work with is ZK watchers. Those are processed asynchronously with respect to the admin API. Because a unit test runs much faster than a human administrative act, you have to wait for more than just the admin API to complete. The cache in each regionserver’s access controller has a last update timestamp and you should wait for them to all update. Look at the HBase unit tests to see how that is done. 

The propagation of cache updates could be made synchronous with the update itself if the grant and revoke admin apis were refactored as Procedures. The Procedure would have a sub-procedure that establishes a barrier around the cache refresh at all live regionservers, presumably. There is a JIRA for that work filed somewhere, an old one. It’s not been important enough for someone to pick up and perform. Don’t expect this to change.


> On May 31, 2021, at 6:52 AM, Viraj Jasani <vj...@apache.org> wrote:
> 
> For Permission APIs grant/revoke, I just checked the diff b/ HBase 1 and
> HBase 2.
> Phoenix MetaDataClient uses AccessControlClient to grant/revoke permission
> to a user. HBase 1 AccessControlClient invokes coproc
> endpoint AccessController, which puts records in hbase:acl table on
> RegionServer hosting acl's single region. HBase 2 AccessControlClient on
> the other hand invokes Admin APIs to let master take care of adding records
> in hbase:acl table (HBASE-21739). This is the main diff b/ both versions,
> however, both workflows seem synchronous.
> For both HBase 1 and 2, post-put hook updates ZNodes with updated entries
> in acl table, which internally triggers a refresh of table cache in
> AuthManager, and this table cache could be used to verify Auth of user on
> table/namespace in subsequent request (based on our test case: remove
> access, followed by verification of grant permission). If table cache has
> race condition, it should also be applicable to HBase 1 as well I believe,
> yet to compare both at length.
> 
> I understand, Phoenix is not doing anything different w.r.t GRANT/REVOKE
> statements so some sync issue might have been introduced in HBase 2
> unknowingly. I will try to dig more sometime next week and file HBase Jira
> if I could find some evidence of race conditions or could reproduce with
> HBase tests directly.
> 
>> On Mon, May 31, 2021 at 11:40 AM Istvan Toth <st...@apache.org> wrote:
>> 
>> I agree that we are ready for the RC, and that deflaking the tests is
>> great.
>> 
>> However, I think that Permissions test failures are exposing a real problem
>> (probably in HBase),
>> and we've only adopted the tests for the buggy behaviour of HBase, and
>> should track down / fix
>> the root cause.
>> 
>> On Sun, May 30, 2021 at 4:46 PM Viraj Jasani <vj...@apache.org> wrote:
>> 
>>>> IMHO, a flapper is worse than a failing test because it undermines the
>>> confidence in a test run.
>>> 
>>> I concur. In fact, in the last few builds, I have seen a few flappers
>> (e.g
>>> testUpsertSelectWithMultiByteCharsAutoCommit, testAsyncRebuildAll) whose
>>> resolution was attempted just a few months back and now they are showing
>> up
>>> again (with much lower frequency than before), so yes we need some
>>> attention again.
>>> 
>>> After fixing consistent failure with testPherfMain (PHOENIX-6482) and
>> high
>>> frequency flappers with BasePermissionsIT and AuditLoggingIT
>>> (PHOENIX-6483), CI builds are in better shape now.
>>> In the recent build, I can see no test failures with 3/4 HBase profiles.
>> I
>>> will file a Jira for JaCoCo report generation failure, which keeps
>> showing
>>> up often (flapper with JaCoCo issue, not related to test failure).
>>> 
>>> Given that CI builds are in better shape, planning to start an RC soon.
>>> Thanks
>>> 
>>> 
>>> On Sat, May 29, 2021 at 9:45 PM larsh@apache.org <la...@apache.org>
>> wrote:
>>> 
>>>> Nice!
>>>> 
>>>> I noticed we had some successful 5.1 runs too.
>>>> Fixing the tests would need some attention again. Flappers should
>> either
>>>> be fixed or disabled.
>>>> 
>>>> IMHO, a flapper is worse than a failing test because it undermines the
>>>> confidence in a test run.
>>>> If the tests always pass you'll notice a failure right away.
>>>> But when most of the runs produce some kind of failure it is human
>> nature
>>>> to start ignoring them, and then new, real failures pile up.
>>>> 
>>>> 
>>>> -- Lars
>>>> 
>>>> 
>>>> On Friday, May 28, 2021, 11:36:53 PM PDT, Viraj Jasani <
>>> vjasani@apache.org>
>>>> wrote:
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Due to incorrect/missing fix versions, found some discrepancies b/
>> master
>>>> and 5.1 in the Pherf module.
>>>> Backported recent improvements to 5.1 including test fix:
>>>> PHOENIX-6417, PHOENIX-6118, PHOENIX-6430, PHOENIX-6429, PHOENIX-6431,
>>>> PHOENIX-6432
>>>> 
>>>> On Sat, May 29, 2021 at 12:20 AM Viraj Jasani <vj...@apache.org>
>>> wrote:
>>>> 
>>>>> I looked at CI build today and there are a couple of flakies in
>>>> core/pherf.
>>>>> 
>>>>> 3 major flaky tests: PermissionNSEnabledIT, PermissionsCacheIT,
>>>>> AuditLoggingIT (reported with HBase 2.2/2.3/2.4). Permission tests
>> have
>>>>> been flaky for quite some time. IIRC, there was some attempt to fix
>>> them
>>>> as
>>>>> well but still they appear with AccessDeniedException at times. Not
>>> much
>>>>> sure about AuditLoggingIT though.
>>>>> 
>>>>> Pherf test fails with HBase 2.1.
>>>>> 
>>>>> 
>>>>> On Fri, 28 May 2021 at 10:44 PM, larsh@apache.org <la...@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> 5.1 CI builds seems to fail consistently - at least for the past two
>>>>>> weeks.
>>>>>> 
>>>>>> 
>>>> 
>>> 
>> https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-mulitbranch/job/5.1/
>>>>>> 
>>>>>> Might be something to look into.
>>>>>> I have an hour today between meetings, I'll see what I can do. :)
>>>>>> 
>>>>>> -- Lars
>>>>>> 
>>>>>> 
>>>>>> On Wednesday, May 26, 2021, 10:54:13 PM PDT, Viraj Jasani <
>>>>>> vjasani@apache.org> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Thanks Lars. I had an offline chat with Istvan yesterday and I will
>>>>>> prepare
>>>>>> RCs over the weekend.
>>>>>> 
>>>>>> In the meantime, I will again take a look at Jira fixVersion and git
>>>>>> commits compatibility and also try to help with any backport (if
>> still
>>>>>> pending).
>>>>>> 
>>>>>> 
>>>>>> On Wed, 26 May 2021 at 11:53 PM, larsh@apache.org <larsh@apache.org
>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> I cherry-picked three changes into branch 5.1.
>>>>>>> Assuming Jira is up-to-date, there are no 4.16.x issues that are
>> not
>>>>>> also
>>>>>>> in 5.1.x.
>>>>>>> 
>>>>>>> There are 12 issues left that are in 4.17.0 and 5.2.0, but not in
>>>> 5.1.x.
>>>>>>> But they are presumably not in 4.16.x because they violate
>> backward
>>>>>>> compatibility, and - if so - should not be in 5.1.2.
>>>>>>> 
>>>>>>> -- Lars
>>>>>>> 
>>>>>>> On Wednesday, May 26, 2021, 10:15:31 AM PDT, larsh@apache.org <
>>>>>>> larsh@apache.org> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Awesome. Thanks Istvan.
>>>>>>> 
>>>>>>> I'll do a pass through the issues too.
>>>>>>> 
>>>>>>> -- Lars
>>>>>>> 
>>>>>>> 
>>>>>>> On Tuesday, May 25, 2021, 8:45:00 PM PDT, Istvan Toth <
>>>> stoty@apache.org
>>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I am not aware of any blockers in 5.1.
>>>>>>> If there is no objection, and no other volunteers for the 5.1.2 RM
>>>>>> role, I
>>>>>>> intend to start the process on Monday.
>>>>>>> 
>>>>>>> I will also look for missing fixes before release, but if anyone
>> is
>>>>>> aware
>>>>>>> of any fix that is needed in 5.1.2 , but
>>>>>>> hasn't been backported, then please try to get it resolved by
>>> Monday.
>>>>>>> 
>>>>>>> regards
>>>>>>> Istvan
>>>>>>> 
>>>>>>> On Wed, May 26, 2021 at 4:19 AM larsh@apache.org <
>> larsh@apache.org>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> There are 17 resolved issues against it.
>>>>>>>> PHOENIX-6436 is needed for further integration with Trino
>>> (formerly
>>>>>>>> Presto).
>>>>>>>> 
>>>>>>>> It is time for another point release?
>>>>>>>> 
>>>>>>>> -- Lars
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Re: Time for 5.1.2?

Posted by Viraj Jasani <vj...@apache.org>.
For Permission APIs grant/revoke, I just checked the diff b/ HBase 1 and
HBase 2.
Phoenix MetaDataClient uses AccessControlClient to grant/revoke permission
to a user. HBase 1 AccessControlClient invokes coproc
endpoint AccessController, which puts records in hbase:acl table on
RegionServer hosting acl's single region. HBase 2 AccessControlClient on
the other hand invokes Admin APIs to let master take care of adding records
in hbase:acl table (HBASE-21739). This is the main diff b/ both versions,
however, both workflows seem synchronous.
For both HBase 1 and 2, post-put hook updates ZNodes with updated entries
in acl table, which internally triggers a refresh of table cache in
AuthManager, and this table cache could be used to verify Auth of user on
table/namespace in subsequent request (based on our test case: remove
access, followed by verification of grant permission). If table cache has
race condition, it should also be applicable to HBase 1 as well I believe,
yet to compare both at length.

I understand, Phoenix is not doing anything different w.r.t GRANT/REVOKE
statements so some sync issue might have been introduced in HBase 2
unknowingly. I will try to dig more sometime next week and file HBase Jira
if I could find some evidence of race conditions or could reproduce with
HBase tests directly.

On Mon, May 31, 2021 at 11:40 AM Istvan Toth <st...@apache.org> wrote:

> I agree that we are ready for the RC, and that deflaking the tests is
> great.
>
> However, I think that Permissions test failures are exposing a real problem
> (probably in HBase),
> and we've only adopted the tests for the buggy behaviour of HBase, and
> should track down / fix
> the root cause.
>
> On Sun, May 30, 2021 at 4:46 PM Viraj Jasani <vj...@apache.org> wrote:
>
> > > IMHO, a flapper is worse than a failing test because it undermines the
> > confidence in a test run.
> >
> > I concur. In fact, in the last few builds, I have seen a few flappers
> (e.g
> > testUpsertSelectWithMultiByteCharsAutoCommit, testAsyncRebuildAll) whose
> > resolution was attempted just a few months back and now they are showing
> up
> > again (with much lower frequency than before), so yes we need some
> > attention again.
> >
> > After fixing consistent failure with testPherfMain (PHOENIX-6482) and
> high
> > frequency flappers with BasePermissionsIT and AuditLoggingIT
> > (PHOENIX-6483), CI builds are in better shape now.
> > In the recent build, I can see no test failures with 3/4 HBase profiles.
> I
> > will file a Jira for JaCoCo report generation failure, which keeps
> showing
> > up often (flapper with JaCoCo issue, not related to test failure).
> >
> > Given that CI builds are in better shape, planning to start an RC soon.
> > Thanks
> >
> >
> > On Sat, May 29, 2021 at 9:45 PM larsh@apache.org <la...@apache.org>
> wrote:
> >
> > > Nice!
> > >
> > > I noticed we had some successful 5.1 runs too.
> > > Fixing the tests would need some attention again. Flappers should
> either
> > > be fixed or disabled.
> > >
> > > IMHO, a flapper is worse than a failing test because it undermines the
> > > confidence in a test run.
> > > If the tests always pass you'll notice a failure right away.
> > > But when most of the runs produce some kind of failure it is human
> nature
> > > to start ignoring them, and then new, real failures pile up.
> > >
> > >
> > > -- Lars
> > >
> > >
> > > On Friday, May 28, 2021, 11:36:53 PM PDT, Viraj Jasani <
> > vjasani@apache.org>
> > > wrote:
> > >
> > >
> > >
> > >
> > >
> > > Due to incorrect/missing fix versions, found some discrepancies b/
> master
> > > and 5.1 in the Pherf module.
> > > Backported recent improvements to 5.1 including test fix:
> > > PHOENIX-6417, PHOENIX-6118, PHOENIX-6430, PHOENIX-6429, PHOENIX-6431,
> > > PHOENIX-6432
> > >
> > > On Sat, May 29, 2021 at 12:20 AM Viraj Jasani <vj...@apache.org>
> > wrote:
> > >
> > > > I looked at CI build today and there are a couple of flakies in
> > > core/pherf.
> > > >
> > > > 3 major flaky tests: PermissionNSEnabledIT, PermissionsCacheIT,
> > > > AuditLoggingIT (reported with HBase 2.2/2.3/2.4). Permission tests
> have
> > > > been flaky for quite some time. IIRC, there was some attempt to fix
> > them
> > > as
> > > > well but still they appear with AccessDeniedException at times. Not
> > much
> > > > sure about AuditLoggingIT though.
> > > >
> > > > Pherf test fails with HBase 2.1.
> > > >
> > > >
> > > > On Fri, 28 May 2021 at 10:44 PM, larsh@apache.org <la...@apache.org>
> > > > wrote:
> > > >
> > > >> 5.1 CI builds seems to fail consistently - at least for the past two
> > > >> weeks.
> > > >>
> > > >>
> > >
> >
> https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-mulitbranch/job/5.1/
> > > >>
> > > >> Might be something to look into.
> > > >> I have an hour today between meetings, I'll see what I can do. :)
> > > >>
> > > >> -- Lars
> > > >>
> > > >>
> > > >> On Wednesday, May 26, 2021, 10:54:13 PM PDT, Viraj Jasani <
> > > >> vjasani@apache.org> wrote:
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Thanks Lars. I had an offline chat with Istvan yesterday and I will
> > > >> prepare
> > > >> RCs over the weekend.
> > > >>
> > > >> In the meantime, I will again take a look at Jira fixVersion and git
> > > >> commits compatibility and also try to help with any backport (if
> still
> > > >> pending).
> > > >>
> > > >>
> > > >> On Wed, 26 May 2021 at 11:53 PM, larsh@apache.org <larsh@apache.org
> >
> > > >> wrote:
> > > >>
> > > >> > I cherry-picked three changes into branch 5.1.
> > > >> > Assuming Jira is up-to-date, there are no 4.16.x issues that are
> not
> > > >> also
> > > >> > in 5.1.x.
> > > >> >
> > > >> > There are 12 issues left that are in 4.17.0 and 5.2.0, but not in
> > > 5.1.x.
> > > >> > But they are presumably not in 4.16.x because they violate
> backward
> > > >> > compatibility, and - if so - should not be in 5.1.2.
> > > >> >
> > > >> > -- Lars
> > > >> >
> > > >> > On Wednesday, May 26, 2021, 10:15:31 AM PDT, larsh@apache.org <
> > > >> > larsh@apache.org> wrote:
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > Awesome. Thanks Istvan.
> > > >> >
> > > >> > I'll do a pass through the issues too.
> > > >> >
> > > >> > -- Lars
> > > >> >
> > > >> >
> > > >> > On Tuesday, May 25, 2021, 8:45:00 PM PDT, Istvan Toth <
> > > stoty@apache.org
> > > >> >
> > > >> > wrote:
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > I am not aware of any blockers in 5.1.
> > > >> > If there is no objection, and no other volunteers for the 5.1.2 RM
> > > >> role, I
> > > >> > intend to start the process on Monday.
> > > >> >
> > > >> > I will also look for missing fixes before release, but if anyone
> is
> > > >> aware
> > > >> > of any fix that is needed in 5.1.2 , but
> > > >> > hasn't been backported, then please try to get it resolved by
> > Monday.
> > > >> >
> > > >> > regards
> > > >> > Istvan
> > > >> >
> > > >> > On Wed, May 26, 2021 at 4:19 AM larsh@apache.org <
> larsh@apache.org>
> > > >> wrote:
> > > >> >
> > > >> > > There are 17 resolved issues against it.
> > > >> > > PHOENIX-6436 is needed for further integration with Trino
> > (formerly
> > > >> > > Presto).
> > > >> > >
> > > >> > > It is time for another point release?
> > > >> > >
> > > >> > > -- Lars
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: Time for 5.1.2?

Posted by Istvan Toth <st...@apache.org>.
I agree that we are ready for the RC, and that deflaking the tests is great.

However, I think that Permissions test failures are exposing a real problem
(probably in HBase),
and we've only adopted the tests for the buggy behaviour of HBase, and
should track down / fix
the root cause.

On Sun, May 30, 2021 at 4:46 PM Viraj Jasani <vj...@apache.org> wrote:

> > IMHO, a flapper is worse than a failing test because it undermines the
> confidence in a test run.
>
> I concur. In fact, in the last few builds, I have seen a few flappers (e.g
> testUpsertSelectWithMultiByteCharsAutoCommit, testAsyncRebuildAll) whose
> resolution was attempted just a few months back and now they are showing up
> again (with much lower frequency than before), so yes we need some
> attention again.
>
> After fixing consistent failure with testPherfMain (PHOENIX-6482) and high
> frequency flappers with BasePermissionsIT and AuditLoggingIT
> (PHOENIX-6483), CI builds are in better shape now.
> In the recent build, I can see no test failures with 3/4 HBase profiles. I
> will file a Jira for JaCoCo report generation failure, which keeps showing
> up often (flapper with JaCoCo issue, not related to test failure).
>
> Given that CI builds are in better shape, planning to start an RC soon.
> Thanks
>
>
> On Sat, May 29, 2021 at 9:45 PM larsh@apache.org <la...@apache.org> wrote:
>
> > Nice!
> >
> > I noticed we had some successful 5.1 runs too.
> > Fixing the tests would need some attention again. Flappers should either
> > be fixed or disabled.
> >
> > IMHO, a flapper is worse than a failing test because it undermines the
> > confidence in a test run.
> > If the tests always pass you'll notice a failure right away.
> > But when most of the runs produce some kind of failure it is human nature
> > to start ignoring them, and then new, real failures pile up.
> >
> >
> > -- Lars
> >
> >
> > On Friday, May 28, 2021, 11:36:53 PM PDT, Viraj Jasani <
> vjasani@apache.org>
> > wrote:
> >
> >
> >
> >
> >
> > Due to incorrect/missing fix versions, found some discrepancies b/ master
> > and 5.1 in the Pherf module.
> > Backported recent improvements to 5.1 including test fix:
> > PHOENIX-6417, PHOENIX-6118, PHOENIX-6430, PHOENIX-6429, PHOENIX-6431,
> > PHOENIX-6432
> >
> > On Sat, May 29, 2021 at 12:20 AM Viraj Jasani <vj...@apache.org>
> wrote:
> >
> > > I looked at CI build today and there are a couple of flakies in
> > core/pherf.
> > >
> > > 3 major flaky tests: PermissionNSEnabledIT, PermissionsCacheIT,
> > > AuditLoggingIT (reported with HBase 2.2/2.3/2.4). Permission tests have
> > > been flaky for quite some time. IIRC, there was some attempt to fix
> them
> > as
> > > well but still they appear with AccessDeniedException at times. Not
> much
> > > sure about AuditLoggingIT though.
> > >
> > > Pherf test fails with HBase 2.1.
> > >
> > >
> > > On Fri, 28 May 2021 at 10:44 PM, larsh@apache.org <la...@apache.org>
> > > wrote:
> > >
> > >> 5.1 CI builds seems to fail consistently - at least for the past two
> > >> weeks.
> > >>
> > >>
> >
> https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-mulitbranch/job/5.1/
> > >>
> > >> Might be something to look into.
> > >> I have an hour today between meetings, I'll see what I can do. :)
> > >>
> > >> -- Lars
> > >>
> > >>
> > >> On Wednesday, May 26, 2021, 10:54:13 PM PDT, Viraj Jasani <
> > >> vjasani@apache.org> wrote:
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> Thanks Lars. I had an offline chat with Istvan yesterday and I will
> > >> prepare
> > >> RCs over the weekend.
> > >>
> > >> In the meantime, I will again take a look at Jira fixVersion and git
> > >> commits compatibility and also try to help with any backport (if still
> > >> pending).
> > >>
> > >>
> > >> On Wed, 26 May 2021 at 11:53 PM, larsh@apache.org <la...@apache.org>
> > >> wrote:
> > >>
> > >> > I cherry-picked three changes into branch 5.1.
> > >> > Assuming Jira is up-to-date, there are no 4.16.x issues that are not
> > >> also
> > >> > in 5.1.x.
> > >> >
> > >> > There are 12 issues left that are in 4.17.0 and 5.2.0, but not in
> > 5.1.x.
> > >> > But they are presumably not in 4.16.x because they violate backward
> > >> > compatibility, and - if so - should not be in 5.1.2.
> > >> >
> > >> > -- Lars
> > >> >
> > >> > On Wednesday, May 26, 2021, 10:15:31 AM PDT, larsh@apache.org <
> > >> > larsh@apache.org> wrote:
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > Awesome. Thanks Istvan.
> > >> >
> > >> > I'll do a pass through the issues too.
> > >> >
> > >> > -- Lars
> > >> >
> > >> >
> > >> > On Tuesday, May 25, 2021, 8:45:00 PM PDT, Istvan Toth <
> > stoty@apache.org
> > >> >
> > >> > wrote:
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > I am not aware of any blockers in 5.1.
> > >> > If there is no objection, and no other volunteers for the 5.1.2 RM
> > >> role, I
> > >> > intend to start the process on Monday.
> > >> >
> > >> > I will also look for missing fixes before release, but if anyone is
> > >> aware
> > >> > of any fix that is needed in 5.1.2 , but
> > >> > hasn't been backported, then please try to get it resolved by
> Monday.
> > >> >
> > >> > regards
> > >> > Istvan
> > >> >
> > >> > On Wed, May 26, 2021 at 4:19 AM larsh@apache.org <la...@apache.org>
> > >> wrote:
> > >> >
> > >> > > There are 17 resolved issues against it.
> > >> > > PHOENIX-6436 is needed for further integration with Trino
> (formerly
> > >> > > Presto).
> > >> > >
> > >> > > It is time for another point release?
> > >> > >
> > >> > > -- Lars
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: Time for 5.1.2?

Posted by Viraj Jasani <vj...@apache.org>.
> IMHO, a flapper is worse than a failing test because it undermines the
confidence in a test run.

I concur. In fact, in the last few builds, I have seen a few flappers (e.g
testUpsertSelectWithMultiByteCharsAutoCommit, testAsyncRebuildAll) whose
resolution was attempted just a few months back and now they are showing up
again (with much lower frequency than before), so yes we need some
attention again.

After fixing consistent failure with testPherfMain (PHOENIX-6482) and high
frequency flappers with BasePermissionsIT and AuditLoggingIT
(PHOENIX-6483), CI builds are in better shape now.
In the recent build, I can see no test failures with 3/4 HBase profiles. I
will file a Jira for JaCoCo report generation failure, which keeps showing
up often (flapper with JaCoCo issue, not related to test failure).

Given that CI builds are in better shape, planning to start an RC soon.
Thanks


On Sat, May 29, 2021 at 9:45 PM larsh@apache.org <la...@apache.org> wrote:

> Nice!
>
> I noticed we had some successful 5.1 runs too.
> Fixing the tests would need some attention again. Flappers should either
> be fixed or disabled.
>
> IMHO, a flapper is worse than a failing test because it undermines the
> confidence in a test run.
> If the tests always pass you'll notice a failure right away.
> But when most of the runs produce some kind of failure it is human nature
> to start ignoring them, and then new, real failures pile up.
>
>
> -- Lars
>
>
> On Friday, May 28, 2021, 11:36:53 PM PDT, Viraj Jasani <vj...@apache.org>
> wrote:
>
>
>
>
>
> Due to incorrect/missing fix versions, found some discrepancies b/ master
> and 5.1 in the Pherf module.
> Backported recent improvements to 5.1 including test fix:
> PHOENIX-6417, PHOENIX-6118, PHOENIX-6430, PHOENIX-6429, PHOENIX-6431,
> PHOENIX-6432
>
> On Sat, May 29, 2021 at 12:20 AM Viraj Jasani <vj...@apache.org> wrote:
>
> > I looked at CI build today and there are a couple of flakies in
> core/pherf.
> >
> > 3 major flaky tests: PermissionNSEnabledIT, PermissionsCacheIT,
> > AuditLoggingIT (reported with HBase 2.2/2.3/2.4). Permission tests have
> > been flaky for quite some time. IIRC, there was some attempt to fix them
> as
> > well but still they appear with AccessDeniedException at times. Not much
> > sure about AuditLoggingIT though.
> >
> > Pherf test fails with HBase 2.1.
> >
> >
> > On Fri, 28 May 2021 at 10:44 PM, larsh@apache.org <la...@apache.org>
> > wrote:
> >
> >> 5.1 CI builds seems to fail consistently - at least for the past two
> >> weeks.
> >>
> >>
> https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-mulitbranch/job/5.1/
> >>
> >> Might be something to look into.
> >> I have an hour today between meetings, I'll see what I can do. :)
> >>
> >> -- Lars
> >>
> >>
> >> On Wednesday, May 26, 2021, 10:54:13 PM PDT, Viraj Jasani <
> >> vjasani@apache.org> wrote:
> >>
> >>
> >>
> >>
> >>
> >> Thanks Lars. I had an offline chat with Istvan yesterday and I will
> >> prepare
> >> RCs over the weekend.
> >>
> >> In the meantime, I will again take a look at Jira fixVersion and git
> >> commits compatibility and also try to help with any backport (if still
> >> pending).
> >>
> >>
> >> On Wed, 26 May 2021 at 11:53 PM, larsh@apache.org <la...@apache.org>
> >> wrote:
> >>
> >> > I cherry-picked three changes into branch 5.1.
> >> > Assuming Jira is up-to-date, there are no 4.16.x issues that are not
> >> also
> >> > in 5.1.x.
> >> >
> >> > There are 12 issues left that are in 4.17.0 and 5.2.0, but not in
> 5.1.x.
> >> > But they are presumably not in 4.16.x because they violate backward
> >> > compatibility, and - if so - should not be in 5.1.2.
> >> >
> >> > -- Lars
> >> >
> >> > On Wednesday, May 26, 2021, 10:15:31 AM PDT, larsh@apache.org <
> >> > larsh@apache.org> wrote:
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Awesome. Thanks Istvan.
> >> >
> >> > I'll do a pass through the issues too.
> >> >
> >> > -- Lars
> >> >
> >> >
> >> > On Tuesday, May 25, 2021, 8:45:00 PM PDT, Istvan Toth <
> stoty@apache.org
> >> >
> >> > wrote:
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > I am not aware of any blockers in 5.1.
> >> > If there is no objection, and no other volunteers for the 5.1.2 RM
> >> role, I
> >> > intend to start the process on Monday.
> >> >
> >> > I will also look for missing fixes before release, but if anyone is
> >> aware
> >> > of any fix that is needed in 5.1.2 , but
> >> > hasn't been backported, then please try to get it resolved by Monday.
> >> >
> >> > regards
> >> > Istvan
> >> >
> >> > On Wed, May 26, 2021 at 4:19 AM larsh@apache.org <la...@apache.org>
> >> wrote:
> >> >
> >> > > There are 17 resolved issues against it.
> >> > > PHOENIX-6436 is needed for further integration with Trino (formerly
> >> > > Presto).
> >> > >
> >> > > It is time for another point release?
> >> > >
> >> > > -- Lars
> >> > >
> >> >
> >>
> >
>

Re: Time for 5.1.2?

Posted by "larsh@apache.org" <la...@apache.org>.
Nice!

I noticed we had some successful 5.1 runs too.
Fixing the tests would need some attention again. Flappers should either be fixed or disabled.

IMHO, a flapper is worse than a failing test because it undermines the confidence in a test run.
If the tests always pass you'll notice a failure right away.
But when most of the runs produce some kind of failure it is human nature to start ignoring them, and then new, real failures pile up.


-- Lars


On Friday, May 28, 2021, 11:36:53 PM PDT, Viraj Jasani <vj...@apache.org> wrote: 





Due to incorrect/missing fix versions, found some discrepancies b/ master
and 5.1 in the Pherf module.
Backported recent improvements to 5.1 including test fix:
PHOENIX-6417, PHOENIX-6118, PHOENIX-6430, PHOENIX-6429, PHOENIX-6431,
PHOENIX-6432

On Sat, May 29, 2021 at 12:20 AM Viraj Jasani <vj...@apache.org> wrote:

> I looked at CI build today and there are a couple of flakies in core/pherf.
>
> 3 major flaky tests: PermissionNSEnabledIT, PermissionsCacheIT,
> AuditLoggingIT (reported with HBase 2.2/2.3/2.4). Permission tests have
> been flaky for quite some time. IIRC, there was some attempt to fix them as
> well but still they appear with AccessDeniedException at times. Not much
> sure about AuditLoggingIT though.
>
> Pherf test fails with HBase 2.1.
>
>
> On Fri, 28 May 2021 at 10:44 PM, larsh@apache.org <la...@apache.org>
> wrote:
>
>> 5.1 CI builds seems to fail consistently - at least for the past two
>> weeks.
>>
>> https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-mulitbranch/job/5.1/
>>
>> Might be something to look into.
>> I have an hour today between meetings, I'll see what I can do. :)
>>
>> -- Lars
>>
>>
>> On Wednesday, May 26, 2021, 10:54:13 PM PDT, Viraj Jasani <
>> vjasani@apache.org> wrote:
>>
>>
>>
>>
>>
>> Thanks Lars. I had an offline chat with Istvan yesterday and I will
>> prepare
>> RCs over the weekend.
>>
>> In the meantime, I will again take a look at Jira fixVersion and git
>> commits compatibility and also try to help with any backport (if still
>> pending).
>>
>>
>> On Wed, 26 May 2021 at 11:53 PM, larsh@apache.org <la...@apache.org>
>> wrote:
>>
>> > I cherry-picked three changes into branch 5.1.
>> > Assuming Jira is up-to-date, there are no 4.16.x issues that are not
>> also
>> > in 5.1.x.
>> >
>> > There are 12 issues left that are in 4.17.0 and 5.2.0, but not in 5.1.x.
>> > But they are presumably not in 4.16.x because they violate backward
>> > compatibility, and - if so - should not be in 5.1.2.
>> >
>> > -- Lars
>> >
>> > On Wednesday, May 26, 2021, 10:15:31 AM PDT, larsh@apache.org <
>> > larsh@apache.org> wrote:
>> >
>> >
>> >
>> >
>> >
>> > Awesome. Thanks Istvan.
>> >
>> > I'll do a pass through the issues too.
>> >
>> > -- Lars
>> >
>> >
>> > On Tuesday, May 25, 2021, 8:45:00 PM PDT, Istvan Toth <stoty@apache.org
>> >
>> > wrote:
>> >
>> >
>> >
>> >
>> >
>> > I am not aware of any blockers in 5.1.
>> > If there is no objection, and no other volunteers for the 5.1.2 RM
>> role, I
>> > intend to start the process on Monday.
>> >
>> > I will also look for missing fixes before release, but if anyone is
>> aware
>> > of any fix that is needed in 5.1.2 , but
>> > hasn't been backported, then please try to get it resolved by Monday.
>> >
>> > regards
>> > Istvan
>> >
>> > On Wed, May 26, 2021 at 4:19 AM larsh@apache.org <la...@apache.org>
>> wrote:
>> >
>> > > There are 17 resolved issues against it.
>> > > PHOENIX-6436 is needed for further integration with Trino (formerly
>> > > Presto).
>> > >
>> > > It is time for another point release?
>> > >
>> > > -- Lars
>> > >
>> >
>>
>

Re: Time for 5.1.2?

Posted by Viraj Jasani <vj...@apache.org>.
Due to incorrect/missing fix versions, found some discrepancies b/ master
and 5.1 in the Pherf module.
Backported recent improvements to 5.1 including test fix:
PHOENIX-6417, PHOENIX-6118, PHOENIX-6430, PHOENIX-6429, PHOENIX-6431,
PHOENIX-6432

On Sat, May 29, 2021 at 12:20 AM Viraj Jasani <vj...@apache.org> wrote:

> I looked at CI build today and there are a couple of flakies in core/pherf.
>
> 3 major flaky tests: PermissionNSEnabledIT, PermissionsCacheIT,
> AuditLoggingIT (reported with HBase 2.2/2.3/2.4). Permission tests have
> been flaky for quite some time. IIRC, there was some attempt to fix them as
> well but still they appear with AccessDeniedException at times. Not much
> sure about AuditLoggingIT though.
>
> Pherf test fails with HBase 2.1.
>
>
> On Fri, 28 May 2021 at 10:44 PM, larsh@apache.org <la...@apache.org>
> wrote:
>
>> 5.1 CI builds seems to fail consistently - at least for the past two
>> weeks.
>>
>> https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-mulitbranch/job/5.1/
>>
>> Might be something to look into.
>> I have an hour today between meetings, I'll see what I can do. :)
>>
>> -- Lars
>>
>>
>> On Wednesday, May 26, 2021, 10:54:13 PM PDT, Viraj Jasani <
>> vjasani@apache.org> wrote:
>>
>>
>>
>>
>>
>> Thanks Lars. I had an offline chat with Istvan yesterday and I will
>> prepare
>> RCs over the weekend.
>>
>> In the meantime, I will again take a look at Jira fixVersion and git
>> commits compatibility and also try to help with any backport (if still
>> pending).
>>
>>
>> On Wed, 26 May 2021 at 11:53 PM, larsh@apache.org <la...@apache.org>
>> wrote:
>>
>> > I cherry-picked three changes into branch 5.1.
>> > Assuming Jira is up-to-date, there are no 4.16.x issues that are not
>> also
>> > in 5.1.x.
>> >
>> > There are 12 issues left that are in 4.17.0 and 5.2.0, but not in 5.1.x.
>> > But they are presumably not in 4.16.x because they violate backward
>> > compatibility, and - if so - should not be in 5.1.2.
>> >
>> > -- Lars
>> >
>> > On Wednesday, May 26, 2021, 10:15:31 AM PDT, larsh@apache.org <
>> > larsh@apache.org> wrote:
>> >
>> >
>> >
>> >
>> >
>> > Awesome. Thanks Istvan.
>> >
>> > I'll do a pass through the issues too.
>> >
>> > -- Lars
>> >
>> >
>> > On Tuesday, May 25, 2021, 8:45:00 PM PDT, Istvan Toth <stoty@apache.org
>> >
>> > wrote:
>> >
>> >
>> >
>> >
>> >
>> > I am not aware of any blockers in 5.1.
>> > If there is no objection, and no other volunteers for the 5.1.2 RM
>> role, I
>> > intend to start the process on Monday.
>> >
>> > I will also look for missing fixes before release, but if anyone is
>> aware
>> > of any fix that is needed in 5.1.2 , but
>> > hasn't been backported, then please try to get it resolved by Monday.
>> >
>> > regards
>> > Istvan
>> >
>> > On Wed, May 26, 2021 at 4:19 AM larsh@apache.org <la...@apache.org>
>> wrote:
>> >
>> > > There are 17 resolved issues against it.
>> > > PHOENIX-6436 is needed for further integration with Trino (formerly
>> > > Presto).
>> > >
>> > > It is time for another point release?
>> > >
>> > > -- Lars
>> > >
>> >
>>
>

Re: Time for 5.1.2?

Posted by Viraj Jasani <vj...@apache.org>.
I looked at CI build today and there are a couple of flakies in core/pherf.

3 major flaky tests: PermissionNSEnabledIT, PermissionsCacheIT,
AuditLoggingIT (reported with HBase 2.2/2.3/2.4). Permission tests have
been flaky for quite some time. IIRC, there was some attempt to fix them as
well but still they appear with AccessDeniedException at times. Not much
sure about AuditLoggingIT though.

Pherf test fails with HBase 2.1.


On Fri, 28 May 2021 at 10:44 PM, larsh@apache.org <la...@apache.org> wrote:

> 5.1 CI builds seems to fail consistently - at least for the past two weeks.
>
> https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-mulitbranch/job/5.1/
>
> Might be something to look into.
> I have an hour today between meetings, I'll see what I can do. :)
>
> -- Lars
>
>
> On Wednesday, May 26, 2021, 10:54:13 PM PDT, Viraj Jasani <
> vjasani@apache.org> wrote:
>
>
>
>
>
> Thanks Lars. I had an offline chat with Istvan yesterday and I will prepare
> RCs over the weekend.
>
> In the meantime, I will again take a look at Jira fixVersion and git
> commits compatibility and also try to help with any backport (if still
> pending).
>
>
> On Wed, 26 May 2021 at 11:53 PM, larsh@apache.org <la...@apache.org>
> wrote:
>
> > I cherry-picked three changes into branch 5.1.
> > Assuming Jira is up-to-date, there are no 4.16.x issues that are not also
> > in 5.1.x.
> >
> > There are 12 issues left that are in 4.17.0 and 5.2.0, but not in 5.1.x.
> > But they are presumably not in 4.16.x because they violate backward
> > compatibility, and - if so - should not be in 5.1.2.
> >
> > -- Lars
> >
> > On Wednesday, May 26, 2021, 10:15:31 AM PDT, larsh@apache.org <
> > larsh@apache.org> wrote:
> >
> >
> >
> >
> >
> > Awesome. Thanks Istvan.
> >
> > I'll do a pass through the issues too.
> >
> > -- Lars
> >
> >
> > On Tuesday, May 25, 2021, 8:45:00 PM PDT, Istvan Toth <st...@apache.org>
> > wrote:
> >
> >
> >
> >
> >
> > I am not aware of any blockers in 5.1.
> > If there is no objection, and no other volunteers for the 5.1.2 RM role,
> I
> > intend to start the process on Monday.
> >
> > I will also look for missing fixes before release, but if anyone is aware
> > of any fix that is needed in 5.1.2 , but
> > hasn't been backported, then please try to get it resolved by Monday.
> >
> > regards
> > Istvan
> >
> > On Wed, May 26, 2021 at 4:19 AM larsh@apache.org <la...@apache.org>
> wrote:
> >
> > > There are 17 resolved issues against it.
> > > PHOENIX-6436 is needed for further integration with Trino (formerly
> > > Presto).
> > >
> > > It is time for another point release?
> > >
> > > -- Lars
> > >
> >
>

Re: Time for 5.1.2?

Posted by "larsh@apache.org" <la...@apache.org>.
5.1 CI builds seems to fail consistently - at least for the past two weeks.

https://ci-hadoop.apache.org/job/Phoenix/job/Phoenix-mulitbranch/job/5.1/

Might be something to look into.
I have an hour today between meetings, I'll see what I can do. :)

-- Lars


On Wednesday, May 26, 2021, 10:54:13 PM PDT, Viraj Jasani <vj...@apache.org> wrote: 





Thanks Lars. I had an offline chat with Istvan yesterday and I will prepare
RCs over the weekend.

In the meantime, I will again take a look at Jira fixVersion and git
commits compatibility and also try to help with any backport (if still
pending).


On Wed, 26 May 2021 at 11:53 PM, larsh@apache.org <la...@apache.org> wrote:

> I cherry-picked three changes into branch 5.1.
> Assuming Jira is up-to-date, there are no 4.16.x issues that are not also
> in 5.1.x.
>
> There are 12 issues left that are in 4.17.0 and 5.2.0, but not in 5.1.x.
> But they are presumably not in 4.16.x because they violate backward
> compatibility, and - if so - should not be in 5.1.2.
>
> -- Lars
>
> On Wednesday, May 26, 2021, 10:15:31 AM PDT, larsh@apache.org <
> larsh@apache.org> wrote:
>
>
>
>
>
> Awesome. Thanks Istvan.
>
> I'll do a pass through the issues too.
>
> -- Lars
>
>
> On Tuesday, May 25, 2021, 8:45:00 PM PDT, Istvan Toth <st...@apache.org>
> wrote:
>
>
>
>
>
> I am not aware of any blockers in 5.1.
> If there is no objection, and no other volunteers for the 5.1.2 RM role, I
> intend to start the process on Monday.
>
> I will also look for missing fixes before release, but if anyone is aware
> of any fix that is needed in 5.1.2 , but
> hasn't been backported, then please try to get it resolved by Monday.
>
> regards
> Istvan
>
> On Wed, May 26, 2021 at 4:19 AM larsh@apache.org <la...@apache.org> wrote:
>
> > There are 17 resolved issues against it.
> > PHOENIX-6436 is needed for further integration with Trino (formerly
> > Presto).
> >
> > It is time for another point release?
> >
> > -- Lars
> >
>

Re: Time for 5.1.2?

Posted by Viraj Jasani <vj...@apache.org>.
Thanks Lars. I had an offline chat with Istvan yesterday and I will prepare
RCs over the weekend.

In the meantime, I will again take a look at Jira fixVersion and git
commits compatibility and also try to help with any backport (if still
pending).


On Wed, 26 May 2021 at 11:53 PM, larsh@apache.org <la...@apache.org> wrote:

> I cherry-picked three changes into branch 5.1.
> Assuming Jira is up-to-date, there are no 4.16.x issues that are not also
> in 5.1.x.
>
> There are 12 issues left that are in 4.17.0 and 5.2.0, but not in 5.1.x.
> But they are presumably not in 4.16.x because they violate backward
> compatibility, and - if so - should not be in 5.1.2.
>
> -- Lars
>
> On Wednesday, May 26, 2021, 10:15:31 AM PDT, larsh@apache.org <
> larsh@apache.org> wrote:
>
>
>
>
>
> Awesome. Thanks Istvan.
>
> I'll do a pass through the issues too.
>
> -- Lars
>
>
> On Tuesday, May 25, 2021, 8:45:00 PM PDT, Istvan Toth <st...@apache.org>
> wrote:
>
>
>
>
>
> I am not aware of any blockers in 5.1.
> If there is no objection, and no other volunteers for the 5.1.2 RM role, I
> intend to start the process on Monday.
>
> I will also look for missing fixes before release, but if anyone is aware
> of any fix that is needed in 5.1.2 , but
> hasn't been backported, then please try to get it resolved by Monday.
>
> regards
> Istvan
>
> On Wed, May 26, 2021 at 4:19 AM larsh@apache.org <la...@apache.org> wrote:
>
> > There are 17 resolved issues against it.
> > PHOENIX-6436 is needed for further integration with Trino (formerly
> > Presto).
> >
> > It is time for another point release?
> >
> > -- Lars
> >
>

Re: Time for 5.1.2?

Posted by "larsh@apache.org" <la...@apache.org>.
I cherry-picked three changes into branch 5.1.
Assuming Jira is up-to-date, there are no 4.16.x issues that are not also in 5.1.x.

There are 12 issues left that are in 4.17.0 and 5.2.0, but not in 5.1.x. But they are presumably not in 4.16.x because they violate backward compatibility, and - if so - should not be in 5.1.2. 

-- Lars

On Wednesday, May 26, 2021, 10:15:31 AM PDT, larsh@apache.org <la...@apache.org> wrote: 





Awesome. Thanks Istvan.

I'll do a pass through the issues too.

-- Lars


On Tuesday, May 25, 2021, 8:45:00 PM PDT, Istvan Toth <st...@apache.org> wrote: 





I am not aware of any blockers in 5.1.
If there is no objection, and no other volunteers for the 5.1.2 RM role, I
intend to start the process on Monday.

I will also look for missing fixes before release, but if anyone is aware
of any fix that is needed in 5.1.2 , but
hasn't been backported, then please try to get it resolved by Monday.

regards
Istvan

On Wed, May 26, 2021 at 4:19 AM larsh@apache.org <la...@apache.org> wrote:

> There are 17 resolved issues against it.
> PHOENIX-6436 is needed for further integration with Trino (formerly
> Presto).
>
> It is time for another point release?
>
> -- Lars
>

Re: Time for 5.1.2?

Posted by "larsh@apache.org" <la...@apache.org>.
Awesome. Thanks Istvan.

I'll do a pass through the issues too.

-- Lars


On Tuesday, May 25, 2021, 8:45:00 PM PDT, Istvan Toth <st...@apache.org> wrote: 





I am not aware of any blockers in 5.1.
If there is no objection, and no other volunteers for the 5.1.2 RM role, I
intend to start the process on Monday.

I will also look for missing fixes before release, but if anyone is aware
of any fix that is needed in 5.1.2 , but
hasn't been backported, then please try to get it resolved by Monday.

regards
Istvan

On Wed, May 26, 2021 at 4:19 AM larsh@apache.org <la...@apache.org> wrote:

> There are 17 resolved issues against it.
> PHOENIX-6436 is needed for further integration with Trino (formerly
> Presto).
>
> It is time for another point release?
>
> -- Lars
>

Re: Time for 5.1.2?

Posted by Istvan Toth <st...@apache.org>.
I am not aware of any blockers in 5.1.
If there is no objection, and no other volunteers for the 5.1.2 RM role, I
intend to start the process on Monday.

I will also look for missing fixes before release, but if anyone is aware
of any fix that is needed in 5.1.2 , but
hasn't been backported, then please try to get it resolved by Monday.

regards
Istvan

On Wed, May 26, 2021 at 4:19 AM larsh@apache.org <la...@apache.org> wrote:

> There are 17 resolved issues against it.
> PHOENIX-6436 is needed for further integration with Trino (formerly
> Presto).
>
> It is time for another point release?
>
> -- Lars
>