You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by Zoltan Haindrich <ki...@rxd.hu> on 2020/05/17 20:51:38 UTC

Re: [DISCUSS] Replace ptest with hive-test-kube

Hello all!

The proposed system have become more stable lately - and I think I've solved a few sources of flakiness.
To be really usable I also wanted to add a way to dynamically enable/disable a set of tests (for example the replication tests take ~7 hours to execute from the total of 24 
hours - and they are also a bit unstable, so not running them when not neccesary would be beneficial in multiple ways) - but to do this the best would be to throw in 
junit5; unfortunately the current ptest installation uses maven 3.0.5 which doesn't like these kind of things - so instead of hacking a fix for that ....I've removed it 
from the dev branch for now.

I would like to propose to start an evaluation phase of the new test procedures(INFRA-20269)
The process would look something like this:
* someone opens a PR - the tests will be run on the changes
* on every active branches the tests will run from time to time
   * this will produce a bunch of test runs on the master branch as well ; which will show how well the tests behave on the master branch without any patches
* runs on branches (PRs or active development branches(eg:master)) will be rate limited to 5 builds/day
* at most ~4 builds at a time - to maximize resource usage
* turnaround time for a build is right now 2 hours - which I feel like a balanced choice between speed/response time

Possible future benefits:
* toggle features using github tags
* optional testgroups (metastore/replication) tests
* ability to run the metastore verification tests
* possibility to add smoke tests

To enable this I will have to finish the HIVE-22942 ticket - beyond the new Jenkinsfile which defines the full logic;
although I've sinked a lot of time into fixing all kind of flaky tests I would would like to disable around ~25 tests.

I also would like to propose a method to verify the stability of a single test: run it a 100 times in series at the same place where the precommit tests are running.
This will put the bar high enough that only totally stable tests could satisfy it (a 99% stable test has 36% chance to pass this without being caught :D)
After this will be in service it could be used to: validate that an existing test is unstable (before disabling it) - and then used again to prove that it got fixed during 
re-enabling it.

Please let me know what you think!

cheers,
Zoltan

On 4/29/20 4:28 PM, Zoltan Haindrich wrote:
> Hey All!
> 
> I was planning to replace the ptest stuff with something less complex for a while now - I see that we struggle a lot because of ptest is more complicated than it should be...
> It would be much better if it would be constructed from well made existing CI piece. - because of that I've started working on [1] a few months ago.
> 
> It has it's pros and cons...but it's not the same as the existing ptest stuff.
> I've collected some infos about how it compares against the existing one - but it became too long so I've moved it into a google docs document at [3].
> 
> It's not yet ready... I still have some remaining problems/concerns/etc
> * what do you think about changing to a github PR based workflow?
> * it will not support at all things like "isolation" - so we will have to make our tests work with eachother without bending the rules...
> * I've tried to overcommit the cpu resources which creates a more noisy environment for the actual tests - this squeezes out some new problems which should be fixed before 
> this could be enabled.
> * for every PR the first run is somewhat sub-optimal...there are some reasons for this - the actually used resources are the same; but the overall execution time is not 
> optimal; I could accept this as a compromise because right now I wait >24 hours for a precommit run.
> 
> It's deployed at [2] and anyone can start a testrun on it:
> * merge my HIVE-22942-ptest-alt branch from [4] into your branch
> * open a PR against my hive repo on github [5]
> 
> cheers,
> Zoltan
> 
> 
> [1] https://issues.apache.org/jira/browse/HIVE-22942
> [2] http://34.66.156.144:8080/job/hive-precommit
> [3] https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing
> [4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt
> [5] https://github.com/kgyrtkirk/hive/

Re: [DISCUSS] Replace ptest with hive-test-kube

Posted by Zoltan Haindrich <ki...@rxd.hu>.


Peter: I think that's great practice - but if its not safeguarded by something at all the time; eventually it might be overlooked.
for example some time in the past we had javadoc fixed on some modules - but all that work is in vain because someone have added javadoc errors again...

David: using spotless would most probably mean to rewrite almost all source file in the repo - most probably breaking some intentional manual formatting along the way
and it would certainly make it harder to cherry-pick patches between branches..

I just described what I think is best in this area...right now I see the following things:
* on the master the current dist build - produces a package which is unusable - throws exceptions like java.lang.NoSuchFieldError: operands
* the metastore upgrade changes are not checked
   * but with hive-test-kube it could be possible to run those docker tests
* a bunch of flaky tests - this is still a pain for everyone; contributors, committers
   * TestJdbc* is extended a few times - all of them fails from time to time; as a result almost all of them is ignored
   * there is some metastore schema related crap which comes back regularily
   * there are of course more...which have been already disabled
   * we don't yet have the flaky checker job...which could be a big help in improving on this
* javadoc:javadoc doesn't succeed
* we don't have findbugs
* we don't have checkstyle

we can start with the ASF headers...but I think there are other things here which are also important - and have there own benefits.

cheers,
Zoltan

On 6/5/20 1:12 PM, David Mollitor wrote:
> There are a couple Apache projects that use the "spotless" plugin to format
> the code so that it all adheres to the same style.  I'd be in favor of
> adding that to make that code automatically adhere to the check style
> requirements.
> 
> On Fri, Jun 5, 2020, 5:59 AM Peter Vary <pv...@cloudera.com.invalid> wrote:
> 
>> I personally checked the "soft" errors and fixed them if a second run was
>> needed in my patches (which was the case most often than not ,because of
>> the flaky tests anyway :()
>> Also often checked the output before my reviews as well, and asked
>> contributors to fix them. Running them manually for every patch would be an
>> extra chore I would rather avoid.
>>
>> So I agree that we should start enforcing the rules, but if there is an
>> easy way to run them, I would like to see yetus runs in the new system
>> until we get to the "promised land" :) and have enforced rules in place.
>>
>> That said, even if we move forward only with the enforced rules we
>> immediately can start with the following checks:
>> - Rat check - Apache licence header
>> - Whitespace check for all non-generated files
>>
>> Just my 2 cents
>>
>>> On Jun 5, 2020, at 11:18, Stamatis Zampetakis <za...@gmail.com> wrote:
>>>
>>> +1 for failing fast starting with findbugs and eventually covering the
>>> important points from checkstyle.
>>>
>>> Bes,
>>> Stamatis
>>>
>>> On Fri, Jun 5, 2020 at 9:35 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
>>>
>>>>
>>>> Hey Mustafa!
>>>>
>>>> Those checks are not executed anymore in the new system. I always feeled
>>>> it a bit confusing to have a comment which reports about
>>>> checkstyle/finbugs/etc issues; while
>>>> getting a green test run was almost impossible due to the high number of
>>>> randomly falling tests.
>>>> So I don't think it's viable that someone will re-submit the patch with
>>>> style changes.
>>>>
>>>> I think the old approach is a "soft" way of enforcing code quality - in
>>>> which I personally don't believe: code quality should be enforced by
>>>> rules/quality gates/etc.
>>>>
>>>> So I would like to take a different approach...I think we should
>> definetly
>>>> re-introduce these checks - however without "tolerance" being built-in.
>>>> This will most likely
>>>> mean that we will have to soften the ruleset first; but then we may
>>>> gradually increase the bar to a higher level.
>>>>
>>>> The "without tolerance" will mean that this will be checked during (or
>>>> right after) the build phase - so if you make quality mistakes you will
>>>> just not get a test results
>>>> (it will also save resources).
>>>>
>>>> Yesterday Laszlo have opened a ticket about fixing findbugs issues - if
>> we
>>>> do fix those issues; but we never enforce to fail the build - someone
>> might
>>>> just add a few more.
>>>>
>>>> To increase code quality through out the project I think we could take a
>>>> bottom-up approach:
>>>> * first patch:
>>>>    * fix things in a low level module(like common or storage-api)
>>>>    * it should also add the neccessary maven&job changes to enforce
>> things
>>>> during precommit (up-to that module)
>>>> * followups:
>>>>    * raise the bar to higher level modules
>>>>
>>>> Obviously we can't do this for something like checkstyle which detects a
>>>> myriad of small issues:
>>>> * the ruleset should be shrinked to something which needs reasonable
>>>> amount of work to start enforcing
>>>> * later we can enable further rules/fix all of them in the project
>>>>
>>>> What do you think about this?
>>>>
>>>> cheers,
>>>> Zoltan
>>>>
>>>>
>>>>
>>>> On 6/5/20 2:47 AM, Mustafa IMAN wrote:
>>>>> Thank you Zoltan for all this work.
>>>>> I see many PRs are merged based on the new workflow already. The old
>>>>> workflow generates many reports like ASF license/findbugs/checkstyle
>>>> etc. I
>>>>> don't see these in the new Github PR workflow. I am concerned the
>>>> codebase
>>>>> is going to suffer from lack of these reports very quickly. Do these
>>>> checks
>>>>> still happen but are not visible?
>>>>>
>>>>> On Tue, Jun 2, 2020 at 4:41 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I would like to note that you may login to the jenkins instance - to
>>>>>> start/kill builds (or create new jobs).
>>>>>> I've configured github oauth - but since team membership can't be
>>>> queried
>>>>>> from the "apache organization" - it's harder to configure all "hive
>>>>>> committers".
>>>>>> However...I think I've made it available for most of us - if you can't
>>>>>> start builds/etc just let me know your github user and I'll add it.
>>>>>>
>>>>>> cheers,
>>>>>> Zoltan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 5/29/20 2:32 PM, Zoltan Haindrich wrote:
>>>>>>> Hey all!
>>>>>>>
>>>>>>> The patch is now in master - so every new PR or a push on it will
>>>>>> trigger a new run.
>>>>>>>
>>>>>>> Please decide which one would you like to use - open a PR to see the
>>>> new
>>>>>> one work...or upload a patch file to the jira - but please don't do
>>>> both;
>>>>>> because in that case 2
>>>>>>> execution will happen.
>>>>>>>
>>>>>>> The job execution time(2-4 hours) of a single run is a bit higher
>> than
>>>>>> the usual on the ptest server - this is mostly to increase throughput.
>>>>>>>
>>>>>>> The patch also disabled a set of tests; I will send the full list of
>>>>>> skipped tests shortly.
>>>>>>>
>>>>>>> cheers,
>>>>>>> Zoltan
>>>>>>>
>>>>>>>
>>>>>>> On 5/27/20 1:50 PM, Zoltan Haindrich wrote:
>>>>>>>> Hello all!
>>>>>>>>
>>>>>>>> The new stuff is ready to be switched on-to. It needs to be merged
>>>> into
>>>>>> master - and after that anyone who opens a PR will get a run by the
>> new
>>>>>> HiveQA infra.
>>>>>>>> I propose to run the 2 systems side-by-side for some time - the
>>>> regular
>>>>>> master builds will start; and we will see how frequently that is
>>>> polluted
>>>>>> by flaky tests.
>>>>>>>>
>>>>>>>> Note that the current patch also disables around ~25 more tests to
>>>>>> increase stability - to get a better overview about the disabled
>> tests I
>>>>>> think the "direction of the
>>>>>>>> information flow" should be altered; what I mean by that is: instead
>>>> of
>>>>>> just throwing in a jira for "disable test x" and opening a new one
>> like
>>>>>> "fix test x"; only open
>>>>>>>> the latter and place the jira reference into the ignore message;
>>>>>> meanwhile also add a regular report about the actually disabled tests
>> -
>>>> so
>>>>>> people who do know about the
>>>>>>>> importance of a particular test can get involved.
>>>>>>>>
>>>>>>>> Note: the builds.apache.org instance will be shutdown somewhere in
>>>> the
>>>>>> future as well...but I think the new one is a good-enough alternative
>> to
>>>>>> not have to migrate the
>>>>>>>> Hive-precommit job over to https://ci-hadoop.apache.org/.
>>>>>>>>
>>>>>>>> http://34.66.156.144:8080/job/hive-precommit/job/PR-948/5/
>>>>>>>> https://issues.apache.org/jira/browse/HIVE-22942
>>>>>>>> https://github.com/apache/hive/pull/948/files
>>>>>>>>
>>>>>>>> cheers,
>>>>>>>> Zoltan
>>>>>>>>
>>>>>>>> On 5/18/20 1:42 PM, Zoltan Haindrich wrote:
>>>>>>>>> Hey!
>>>>>>>>>
>>>>>>>>> On 5/18/20 11:51 AM, Zoltan Chovan wrote:
>>>>>>>>>> Thank you for all of your efforts, this looks really promising.
>> With
>>>>>> moving
>>>>>>>>>> to github PRs, would that also mean that we move away from the
>>>>>> reviewboard
>>>>>>>>>> for code review?
>>>>>>>>> I didn't thinked about that. I think using github's review
>> interface
>>>>>> will remain optional, because both review systems has there own strong
>>>>>> points - I wouldn't force
>>>>>>>>> anyone to use one over the other. (For some patches reviewboard is
>>>>>> much better; because it's able to track content moves a bit better
>> than
>>>>>> github. - meanwhile github has
>>>>>>>>> a small feature that enables to mark files as reviewed)
>>>>>>>>> As a matter of fact we had sometimes patches on the jira's which
>>>> never
>>>>>> had neither an RB or a PR to review them - having a PR there at least
>>>> will
>>>>>> make it easier for
>>>>>>>>> reviewers to comment.
>>>>>>>>>
>>>>>>>>>> Also, what happens if a PR is updated? Will the tests run for both
>>>> or
>>>>>> just
>>>>>>>>>> for the latest version?
>>>>>>>>> It will trigger a new build - if there is already a build in
>> progress
>>>>>> that will prevent a new build from starting until it finishes...and
>>>> there
>>>>>> is also a 5 builds/day
>>>>>>>>> limit; which might induce some wait.
>>>>>>>>>
>>>>>>>>> cheers,
>>>>>>>>> Zoltan
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Zoltan
>>>>>>>>>>
>>>>>>>>>> On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich <ki...@rxd.hu>
>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello all!
>>>>>>>>>>>
>>>>>>>>>>> The proposed system have become more stable lately - and I think
>>>> I've
>>>>>>>>>>> solved a few sources of flakiness.
>>>>>>>>>>> To be really usable I also wanted to add a way to dynamically
>>>>>>>>>>> enable/disable a set of tests (for example the replication tests
>>>>>> take ~7
>>>>>>>>>>> hours to execute from the total of 24
>>>>>>>>>>> hours - and they are also a bit unstable, so not running them
>> when
>>>>>> not
>>>>>>>>>>> neccesary would be beneficial in multiple ways) - but to do this
>>>> the
>>>>>> best
>>>>>>>>>>> would be to throw in
>>>>>>>>>>> junit5; unfortunately the current ptest installation uses maven
>>>> 3.0.5
>>>>>>>>>>> which doesn't like these kind of things - so instead of hacking a
>>>>>> fix for
>>>>>>>>>>> that ....I've removed it
>>>>>>>>>>> from the dev branch for now.
>>>>>>>>>>>
>>>>>>>>>>> I would like to propose to start an evaluation phase of the new
>>>> test
>>>>>>>>>>> procedures(INFRA-20269)
>>>>>>>>>>> The process would look something like this:
>>>>>>>>>>> * someone opens a PR - the tests will be run on the changes
>>>>>>>>>>> * on every active branches the tests will run from time to time
>>>>>>>>>>>      * this will produce a bunch of test runs on the master branch
>>>> as
>>>>>> well ;
>>>>>>>>>>> which will show how well the tests behave on the master branch
>>>>>> without any
>>>>>>>>>>> patches
>>>>>>>>>>> * runs on branches (PRs or active development
>> branches(eg:master))
>>>>>> will be
>>>>>>>>>>> rate limited to 5 builds/day
>>>>>>>>>>> * at most ~4 builds at a time - to maximize resource usage
>>>>>>>>>>> * turnaround time for a build is right now 2 hours - which I feel
>>>>>> like a
>>>>>>>>>>> balanced choice between speed/response time
>>>>>>>>>>>
>>>>>>>>>>> Possible future benefits:
>>>>>>>>>>> * toggle features using github tags
>>>>>>>>>>> * optional testgroups (metastore/replication) tests
>>>>>>>>>>> * ability to run the metastore verification tests
>>>>>>>>>>> * possibility to add smoke tests
>>>>>>>>>>>
>>>>>>>>>>> To enable this I will have to finish the HIVE-22942 ticket -
>> beyond
>>>>>> the
>>>>>>>>>>> new Jenkinsfile which defines the full logic;
>>>>>>>>>>> although I've sinked a lot of time into fixing all kind of flaky
>>>>>> tests I
>>>>>>>>>>> would would like to disable around ~25 tests.
>>>>>>>>>>>
>>>>>>>>>>> I also would like to propose a method to verify the stability of
>> a
>>>>>> single
>>>>>>>>>>> test: run it a 100 times in series at the same place where the
>>>>>> precommit
>>>>>>>>>>> tests are running.
>>>>>>>>>>> This will put the bar high enough that only totally stable tests
>>>>>> could
>>>>>>>>>>> satisfy it (a 99% stable test has 36% chance to pass this without
>>>>>> being
>>>>>>>>>>> caught :D)
>>>>>>>>>>> After this will be in service it could be used to: validate that
>> an
>>>>>>>>>>> existing test is unstable (before disabling it) - and then used
>>>>>> again to
>>>>>>>>>>> prove that it got fixed during
>>>>>>>>>>> re-enabling it.
>>>>>>>>>>>
>>>>>>>>>>> Please let me know what you think!
>>>>>>>>>>>
>>>>>>>>>>> cheers,
>>>>>>>>>>> Zoltan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 4/29/20 4:28 PM, Zoltan Haindrich wrote:
>>>>>>>>>>>> Hey All!
>>>>>>>>>>>>
>>>>>>>>>>>> I was planning to replace the ptest stuff with something less
>>>>>> complex
>>>>>>>>>>> for a while now - I see that we struggle a lot because of ptest
>> is
>>>>>> more
>>>>>>>>>>> complicated than it should be...
>>>>>>>>>>>> It would be much better if it would be constructed from well
>> made
>>>>>>>>>>> existing CI piece. - because of that I've started working on [1]
>> a
>>>>>> few
>>>>>>>>>>> months ago.
>>>>>>>>>>>>
>>>>>>>>>>>> It has it's pros and cons...but it's not the same as the
>> existing
>>>>>> ptest
>>>>>>>>>>> stuff.
>>>>>>>>>>>> I've collected some infos about how it compares against the
>>>>>> existing one
>>>>>>>>>>> - but it became too long so I've moved it into a google docs
>>>>>> document at
>>>>>>>>>>> [3].
>>>>>>>>>>>>
>>>>>>>>>>>> It's not yet ready... I still have some remaining
>>>>>> problems/concerns/etc
>>>>>>>>>>>> * what do you think about changing to a github PR based
>> workflow?
>>>>>>>>>>>> * it will not support at all things like "isolation" - so we
>> will
>>>>>> have
>>>>>>>>>>> to make our tests work with eachother without bending the
>> rules...
>>>>>>>>>>>> * I've tried to overcommit the cpu resources which creates a
>> more
>>>>>> noisy
>>>>>>>>>>> environment for the actual tests - this squeezes out some new
>>>>>> problems
>>>>>>>>>>> which should be fixed before
>>>>>>>>>>>> this could be enabled.
>>>>>>>>>>>> * for every PR the first run is somewhat sub-optimal...there are
>>>>>> some
>>>>>>>>>>> reasons for this - the actually used resources are the same; but
>>>> the
>>>>>>>>>>> overall execution time is not
>>>>>>>>>>>> optimal; I could accept this as a compromise because right now I
>>>>>> wait
>>>>>>>>>>>> 24 hours for a precommit run.
>>>>>>>>>>>>
>>>>>>>>>>>> It's deployed at [2] and anyone can start a testrun on it:
>>>>>>>>>>>> * merge my HIVE-22942-ptest-alt branch from [4] into your branch
>>>>>>>>>>>> * open a PR against my hive repo on github [5]
>>>>>>>>>>>>
>>>>>>>>>>>> cheers,
>>>>>>>>>>>> Zoltan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/HIVE-22942
>>>>>>>>>>>> [2] http://34.66.156.144:8080/job/hive-precommit
>>>>>>>>>>>> [3]
>>>>>>>>>>>
>>>>>>
>>>>
>> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing
>>>>>>>>>>>> [4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt
>>>>>>>>>>>> [5] https://github.com/kgyrtkirk/hive/
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>>
>

Re: [DISCUSS] Replace ptest with hive-test-kube

Posted by David Mollitor <da...@gmail.com>.

There are a couple Apache projects that use the "spotless" plugin to format
the code so that it all adheres to the same style.  I'd be in favor of
adding that to make that code automatically adhere to the check style
requirements.

On Fri, Jun 5, 2020, 5:59 AM Peter Vary <pv...@cloudera.com.invalid> wrote:

> I personally checked the "soft" errors and fixed them if a second run was
> needed in my patches (which was the case most often than not ,because of
> the flaky tests anyway :()
> Also often checked the output before my reviews as well, and asked
> contributors to fix them. Running them manually for every patch would be an
> extra chore I would rather avoid.
>
> So I agree that we should start enforcing the rules, but if there is an
> easy way to run them, I would like to see yetus runs in the new system
> until we get to the "promised land" :) and have enforced rules in place.
>
> That said, even if we move forward only with the enforced rules we
> immediately can start with the following checks:
> - Rat check - Apache licence header
> - Whitespace check for all non-generated files
>
> Just my 2 cents
>
> > On Jun 5, 2020, at 11:18, Stamatis Zampetakis <za...@gmail.com> wrote:
> >
> > +1 for failing fast starting with findbugs and eventually covering the
> > important points from checkstyle.
> >
> > Bes,
> > Stamatis
> >
> > On Fri, Jun 5, 2020 at 9:35 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
> >
> >>
> >> Hey Mustafa!
> >>
> >> Those checks are not executed anymore in the new system. I always feeled
> >> it a bit confusing to have a comment which reports about
> >> checkstyle/finbugs/etc issues; while
> >> getting a green test run was almost impossible due to the high number of
> >> randomly falling tests.
> >> So I don't think it's viable that someone will re-submit the patch with
> >> style changes.
> >>
> >> I think the old approach is a "soft" way of enforcing code quality - in
> >> which I personally don't believe: code quality should be enforced by
> >> rules/quality gates/etc.
> >>
> >> So I would like to take a different approach...I think we should
> definetly
> >> re-introduce these checks - however without "tolerance" being built-in.
> >> This will most likely
> >> mean that we will have to soften the ruleset first; but then we may
> >> gradually increase the bar to a higher level.
> >>
> >> The "without tolerance" will mean that this will be checked during (or
> >> right after) the build phase - so if you make quality mistakes you will
> >> just not get a test results
> >> (it will also save resources).
> >>
> >> Yesterday Laszlo have opened a ticket about fixing findbugs issues - if
> we
> >> do fix those issues; but we never enforce to fail the build - someone
> might
> >> just add a few more.
> >>
> >> To increase code quality through out the project I think we could take a
> >> bottom-up approach:
> >> * first patch:
> >>   * fix things in a low level module(like common or storage-api)
> >>   * it should also add the neccessary maven&job changes to enforce
> things
> >> during precommit (up-to that module)
> >> * followups:
> >>   * raise the bar to higher level modules
> >>
> >> Obviously we can't do this for something like checkstyle which detects a
> >> myriad of small issues:
> >> * the ruleset should be shrinked to something which needs reasonable
> >> amount of work to start enforcing
> >> * later we can enable further rules/fix all of them in the project
> >>
> >> What do you think about this?
> >>
> >> cheers,
> >> Zoltan
> >>
> >>
> >>
> >> On 6/5/20 2:47 AM, Mustafa IMAN wrote:
> >>> Thank you Zoltan for all this work.
> >>> I see many PRs are merged based on the new workflow already. The old
> >>> workflow generates many reports like ASF license/findbugs/checkstyle
> >> etc. I
> >>> don't see these in the new Github PR workflow. I am concerned the
> >> codebase
> >>> is going to suffer from lack of these reports very quickly. Do these
> >> checks
> >>> still happen but are not visible?
> >>>
> >>> On Tue, Jun 2, 2020 at 4:41 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> I would like to note that you may login to the jenkins instance - to
> >>>> start/kill builds (or create new jobs).
> >>>> I've configured github oauth - but since team membership can't be
> >> queried
> >>>> from the "apache organization" - it's harder to configure all "hive
> >>>> committers".
> >>>> However...I think I've made it available for most of us - if you can't
> >>>> start builds/etc just let me know your github user and I'll add it.
> >>>>
> >>>> cheers,
> >>>> Zoltan
> >>>>
> >>>>
> >>>>
> >>>> On 5/29/20 2:32 PM, Zoltan Haindrich wrote:
> >>>>> Hey all!
> >>>>>
> >>>>> The patch is now in master - so every new PR or a push on it will
> >>>> trigger a new run.
> >>>>>
> >>>>> Please decide which one would you like to use - open a PR to see the
> >> new
> >>>> one work...or upload a patch file to the jira - but please don't do
> >> both;
> >>>> because in that case 2
> >>>>> execution will happen.
> >>>>>
> >>>>> The job execution time(2-4 hours) of a single run is a bit higher
> than
> >>>> the usual on the ptest server - this is mostly to increase throughput.
> >>>>>
> >>>>> The patch also disabled a set of tests; I will send the full list of
> >>>> skipped tests shortly.
> >>>>>
> >>>>> cheers,
> >>>>> Zoltan
> >>>>>
> >>>>>
> >>>>> On 5/27/20 1:50 PM, Zoltan Haindrich wrote:
> >>>>>> Hello all!
> >>>>>>
> >>>>>> The new stuff is ready to be switched on-to. It needs to be merged
> >> into
> >>>> master - and after that anyone who opens a PR will get a run by the
> new
> >>>> HiveQA infra.
> >>>>>> I propose to run the 2 systems side-by-side for some time - the
> >> regular
> >>>> master builds will start; and we will see how frequently that is
> >> polluted
> >>>> by flaky tests.
> >>>>>>
> >>>>>> Note that the current patch also disables around ~25 more tests to
> >>>> increase stability - to get a better overview about the disabled
> tests I
> >>>> think the "direction of the
> >>>>>> information flow" should be altered; what I mean by that is: instead
> >> of
> >>>> just throwing in a jira for "disable test x" and opening a new one
> like
> >>>> "fix test x"; only open
> >>>>>> the latter and place the jira reference into the ignore message;
> >>>> meanwhile also add a regular report about the actually disabled tests
> -
> >> so
> >>>> people who do know about the
> >>>>>> importance of a particular test can get involved.
> >>>>>>
> >>>>>> Note: the builds.apache.org instance will be shutdown somewhere in
> >> the
> >>>> future as well...but I think the new one is a good-enough alternative
> to
> >>>> not have to migrate the
> >>>>>> Hive-precommit job over to https://ci-hadoop.apache.org/.
> >>>>>>
> >>>>>> http://34.66.156.144:8080/job/hive-precommit/job/PR-948/5/
> >>>>>> https://issues.apache.org/jira/browse/HIVE-22942
> >>>>>> https://github.com/apache/hive/pull/948/files
> >>>>>>
> >>>>>> cheers,
> >>>>>> Zoltan
> >>>>>>
> >>>>>> On 5/18/20 1:42 PM, Zoltan Haindrich wrote:
> >>>>>>> Hey!
> >>>>>>>
> >>>>>>> On 5/18/20 11:51 AM, Zoltan Chovan wrote:
> >>>>>>>> Thank you for all of your efforts, this looks really promising.
> With
> >>>> moving
> >>>>>>>> to github PRs, would that also mean that we move away from the
> >>>> reviewboard
> >>>>>>>> for code review?
> >>>>>>> I didn't thinked about that. I think using github's review
> interface
> >>>> will remain optional, because both review systems has there own strong
> >>>> points - I wouldn't force
> >>>>>>> anyone to use one over the other. (For some patches reviewboard is
> >>>> much better; because it's able to track content moves a bit better
> than
> >>>> github. - meanwhile github has
> >>>>>>> a small feature that enables to mark files as reviewed)
> >>>>>>> As a matter of fact we had sometimes patches on the jira's which
> >> never
> >>>> had neither an RB or a PR to review them - having a PR there at least
> >> will
> >>>> make it easier for
> >>>>>>> reviewers to comment.
> >>>>>>>
> >>>>>>>> Also, what happens if a PR is updated? Will the tests run for both
> >> or
> >>>> just
> >>>>>>>> for the latest version?
> >>>>>>> It will trigger a new build - if there is already a build in
> progress
> >>>> that will prevent a new build from starting until it finishes...and
> >> there
> >>>> is also a 5 builds/day
> >>>>>>> limit; which might induce some wait.
> >>>>>>>
> >>>>>>> cheers,
> >>>>>>> Zoltan
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Zoltan
> >>>>>>>>
> >>>>>>>> On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich <ki...@rxd.hu>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hello all!
> >>>>>>>>>
> >>>>>>>>> The proposed system have become more stable lately - and I think
> >> I've
> >>>>>>>>> solved a few sources of flakiness.
> >>>>>>>>> To be really usable I also wanted to add a way to dynamically
> >>>>>>>>> enable/disable a set of tests (for example the replication tests
> >>>> take ~7
> >>>>>>>>> hours to execute from the total of 24
> >>>>>>>>> hours - and they are also a bit unstable, so not running them
> when
> >>>> not
> >>>>>>>>> neccesary would be beneficial in multiple ways) - but to do this
> >> the
> >>>> best
> >>>>>>>>> would be to throw in
> >>>>>>>>> junit5; unfortunately the current ptest installation uses maven
> >> 3.0.5
> >>>>>>>>> which doesn't like these kind of things - so instead of hacking a
> >>>> fix for
> >>>>>>>>> that ....I've removed it
> >>>>>>>>> from the dev branch for now.
> >>>>>>>>>
> >>>>>>>>> I would like to propose to start an evaluation phase of the new
> >> test
> >>>>>>>>> procedures(INFRA-20269)
> >>>>>>>>> The process would look something like this:
> >>>>>>>>> * someone opens a PR - the tests will be run on the changes
> >>>>>>>>> * on every active branches the tests will run from time to time
> >>>>>>>>>     * this will produce a bunch of test runs on the master branch
> >> as
> >>>> well ;
> >>>>>>>>> which will show how well the tests behave on the master branch
> >>>> without any
> >>>>>>>>> patches
> >>>>>>>>> * runs on branches (PRs or active development
> branches(eg:master))
> >>>> will be
> >>>>>>>>> rate limited to 5 builds/day
> >>>>>>>>> * at most ~4 builds at a time - to maximize resource usage
> >>>>>>>>> * turnaround time for a build is right now 2 hours - which I feel
> >>>> like a
> >>>>>>>>> balanced choice between speed/response time
> >>>>>>>>>
> >>>>>>>>> Possible future benefits:
> >>>>>>>>> * toggle features using github tags
> >>>>>>>>> * optional testgroups (metastore/replication) tests
> >>>>>>>>> * ability to run the metastore verification tests
> >>>>>>>>> * possibility to add smoke tests
> >>>>>>>>>
> >>>>>>>>> To enable this I will have to finish the HIVE-22942 ticket -
> beyond
> >>>> the
> >>>>>>>>> new Jenkinsfile which defines the full logic;
> >>>>>>>>> although I've sinked a lot of time into fixing all kind of flaky
> >>>> tests I
> >>>>>>>>> would would like to disable around ~25 tests.
> >>>>>>>>>
> >>>>>>>>> I also would like to propose a method to verify the stability of
> a
> >>>> single
> >>>>>>>>> test: run it a 100 times in series at the same place where the
> >>>> precommit
> >>>>>>>>> tests are running.
> >>>>>>>>> This will put the bar high enough that only totally stable tests
> >>>> could
> >>>>>>>>> satisfy it (a 99% stable test has 36% chance to pass this without
> >>>> being
> >>>>>>>>> caught :D)
> >>>>>>>>> After this will be in service it could be used to: validate that
> an
> >>>>>>>>> existing test is unstable (before disabling it) - and then used
> >>>> again to
> >>>>>>>>> prove that it got fixed during
> >>>>>>>>> re-enabling it.
> >>>>>>>>>
> >>>>>>>>> Please let me know what you think!
> >>>>>>>>>
> >>>>>>>>> cheers,
> >>>>>>>>> Zoltan
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 4/29/20 4:28 PM, Zoltan Haindrich wrote:
> >>>>>>>>>> Hey All!
> >>>>>>>>>>
> >>>>>>>>>> I was planning to replace the ptest stuff with something less
> >>>> complex
> >>>>>>>>> for a while now - I see that we struggle a lot because of ptest
> is
> >>>> more
> >>>>>>>>> complicated than it should be...
> >>>>>>>>>> It would be much better if it would be constructed from well
> made
> >>>>>>>>> existing CI piece. - because of that I've started working on [1]
> a
> >>>> few
> >>>>>>>>> months ago.
> >>>>>>>>>>
> >>>>>>>>>> It has it's pros and cons...but it's not the same as the
> existing
> >>>> ptest
> >>>>>>>>> stuff.
> >>>>>>>>>> I've collected some infos about how it compares against the
> >>>> existing one
> >>>>>>>>> - but it became too long so I've moved it into a google docs
> >>>> document at
> >>>>>>>>> [3].
> >>>>>>>>>>
> >>>>>>>>>> It's not yet ready... I still have some remaining
> >>>> problems/concerns/etc
> >>>>>>>>>> * what do you think about changing to a github PR based
> workflow?
> >>>>>>>>>> * it will not support at all things like "isolation" - so we
> will
> >>>> have
> >>>>>>>>> to make our tests work with eachother without bending the
> rules...
> >>>>>>>>>> * I've tried to overcommit the cpu resources which creates a
> more
> >>>> noisy
> >>>>>>>>> environment for the actual tests - this squeezes out some new
> >>>> problems
> >>>>>>>>> which should be fixed before
> >>>>>>>>>> this could be enabled.
> >>>>>>>>>> * for every PR the first run is somewhat sub-optimal...there are
> >>>> some
> >>>>>>>>> reasons for this - the actually used resources are the same; but
> >> the
> >>>>>>>>> overall execution time is not
> >>>>>>>>>> optimal; I could accept this as a compromise because right now I
> >>>> wait
> >>>>>>>>>> 24 hours for a precommit run.
> >>>>>>>>>>
> >>>>>>>>>> It's deployed at [2] and anyone can start a testrun on it:
> >>>>>>>>>> * merge my HIVE-22942-ptest-alt branch from [4] into your branch
> >>>>>>>>>> * open a PR against my hive repo on github [5]
> >>>>>>>>>>
> >>>>>>>>>> cheers,
> >>>>>>>>>> Zoltan
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> [1] https://issues.apache.org/jira/browse/HIVE-22942
> >>>>>>>>>> [2] http://34.66.156.144:8080/job/hive-precommit
> >>>>>>>>>> [3]
> >>>>>>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing
> >>>>>>>>>> [4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt
> >>>>>>>>>> [5] https://github.com/kgyrtkirk/hive/
> >>>>>>>>>
> >>>>>>>>
> >>>>
> >>>
> >>
>
>

Re: [DISCUSS] Replace ptest with hive-test-kube

Posted by Peter Vary <pv...@cloudera.com.INVALID>.

I personally checked the "soft" errors and fixed them if a second run was needed in my patches (which was the case most often than not ,because of the flaky tests anyway :()
Also often checked the output before my reviews as well, and asked contributors to fix them. Running them manually for every patch would be an extra chore I would rather avoid.

So I agree that we should start enforcing the rules, but if there is an easy way to run them, I would like to see yetus runs in the new system until we get to the "promised land" :) and have enforced rules in place.

That said, even if we move forward only with the enforced rules we immediately can start with the following checks:
- Rat check - Apache licence header
- Whitespace check for all non-generated files

Just my 2 cents

> On Jun 5, 2020, at 11:18, Stamatis Zampetakis <za...@gmail.com> wrote:
> 
> +1 for failing fast starting with findbugs and eventually covering the
> important points from checkstyle.
> 
> Bes,
> Stamatis
> 
> On Fri, Jun 5, 2020 at 9:35 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
> 
>> 
>> Hey Mustafa!
>> 
>> Those checks are not executed anymore in the new system. I always feeled
>> it a bit confusing to have a comment which reports about
>> checkstyle/finbugs/etc issues; while
>> getting a green test run was almost impossible due to the high number of
>> randomly falling tests.
>> So I don't think it's viable that someone will re-submit the patch with
>> style changes.
>> 
>> I think the old approach is a "soft" way of enforcing code quality - in
>> which I personally don't believe: code quality should be enforced by
>> rules/quality gates/etc.
>> 
>> So I would like to take a different approach...I think we should definetly
>> re-introduce these checks - however without "tolerance" being built-in.
>> This will most likely
>> mean that we will have to soften the ruleset first; but then we may
>> gradually increase the bar to a higher level.
>> 
>> The "without tolerance" will mean that this will be checked during (or
>> right after) the build phase - so if you make quality mistakes you will
>> just not get a test results
>> (it will also save resources).
>> 
>> Yesterday Laszlo have opened a ticket about fixing findbugs issues - if we
>> do fix those issues; but we never enforce to fail the build - someone might
>> just add a few more.
>> 
>> To increase code quality through out the project I think we could take a
>> bottom-up approach:
>> * first patch:
>>   * fix things in a low level module(like common or storage-api)
>>   * it should also add the neccessary maven&job changes to enforce things
>> during precommit (up-to that module)
>> * followups:
>>   * raise the bar to higher level modules
>> 
>> Obviously we can't do this for something like checkstyle which detects a
>> myriad of small issues:
>> * the ruleset should be shrinked to something which needs reasonable
>> amount of work to start enforcing
>> * later we can enable further rules/fix all of them in the project
>> 
>> What do you think about this?
>> 
>> cheers,
>> Zoltan
>> 
>> 
>> 
>> On 6/5/20 2:47 AM, Mustafa IMAN wrote:
>>> Thank you Zoltan for all this work.
>>> I see many PRs are merged based on the new workflow already. The old
>>> workflow generates many reports like ASF license/findbugs/checkstyle
>> etc. I
>>> don't see these in the new Github PR workflow. I am concerned the
>> codebase
>>> is going to suffer from lack of these reports very quickly. Do these
>> checks
>>> still happen but are not visible?
>>> 
>>> On Tue, Jun 2, 2020 at 4:41 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I would like to note that you may login to the jenkins instance - to
>>>> start/kill builds (or create new jobs).
>>>> I've configured github oauth - but since team membership can't be
>> queried
>>>> from the "apache organization" - it's harder to configure all "hive
>>>> committers".
>>>> However...I think I've made it available for most of us - if you can't
>>>> start builds/etc just let me know your github user and I'll add it.
>>>> 
>>>> cheers,
>>>> Zoltan
>>>> 
>>>> 
>>>> 
>>>> On 5/29/20 2:32 PM, Zoltan Haindrich wrote:
>>>>> Hey all!
>>>>> 
>>>>> The patch is now in master - so every new PR or a push on it will
>>>> trigger a new run.
>>>>> 
>>>>> Please decide which one would you like to use - open a PR to see the
>> new
>>>> one work...or upload a patch file to the jira - but please don't do
>> both;
>>>> because in that case 2
>>>>> execution will happen.
>>>>> 
>>>>> The job execution time(2-4 hours) of a single run is a bit higher than
>>>> the usual on the ptest server - this is mostly to increase throughput.
>>>>> 
>>>>> The patch also disabled a set of tests; I will send the full list of
>>>> skipped tests shortly.
>>>>> 
>>>>> cheers,
>>>>> Zoltan
>>>>> 
>>>>> 
>>>>> On 5/27/20 1:50 PM, Zoltan Haindrich wrote:
>>>>>> Hello all!
>>>>>> 
>>>>>> The new stuff is ready to be switched on-to. It needs to be merged
>> into
>>>> master - and after that anyone who opens a PR will get a run by the new
>>>> HiveQA infra.
>>>>>> I propose to run the 2 systems side-by-side for some time - the
>> regular
>>>> master builds will start; and we will see how frequently that is
>> polluted
>>>> by flaky tests.
>>>>>> 
>>>>>> Note that the current patch also disables around ~25 more tests to
>>>> increase stability - to get a better overview about the disabled tests I
>>>> think the "direction of the
>>>>>> information flow" should be altered; what I mean by that is: instead
>> of
>>>> just throwing in a jira for "disable test x" and opening a new one like
>>>> "fix test x"; only open
>>>>>> the latter and place the jira reference into the ignore message;
>>>> meanwhile also add a regular report about the actually disabled tests -
>> so
>>>> people who do know about the
>>>>>> importance of a particular test can get involved.
>>>>>> 
>>>>>> Note: the builds.apache.org instance will be shutdown somewhere in
>> the
>>>> future as well...but I think the new one is a good-enough alternative to
>>>> not have to migrate the
>>>>>> Hive-precommit job over to https://ci-hadoop.apache.org/.
>>>>>> 
>>>>>> http://34.66.156.144:8080/job/hive-precommit/job/PR-948/5/
>>>>>> https://issues.apache.org/jira/browse/HIVE-22942
>>>>>> https://github.com/apache/hive/pull/948/files
>>>>>> 
>>>>>> cheers,
>>>>>> Zoltan
>>>>>> 
>>>>>> On 5/18/20 1:42 PM, Zoltan Haindrich wrote:
>>>>>>> Hey!
>>>>>>> 
>>>>>>> On 5/18/20 11:51 AM, Zoltan Chovan wrote:
>>>>>>>> Thank you for all of your efforts, this looks really promising. With
>>>> moving
>>>>>>>> to github PRs, would that also mean that we move away from the
>>>> reviewboard
>>>>>>>> for code review?
>>>>>>> I didn't thinked about that. I think using github's review interface
>>>> will remain optional, because both review systems has there own strong
>>>> points - I wouldn't force
>>>>>>> anyone to use one over the other. (For some patches reviewboard is
>>>> much better; because it's able to track content moves a bit better than
>>>> github. - meanwhile github has
>>>>>>> a small feature that enables to mark files as reviewed)
>>>>>>> As a matter of fact we had sometimes patches on the jira's which
>> never
>>>> had neither an RB or a PR to review them - having a PR there at least
>> will
>>>> make it easier for
>>>>>>> reviewers to comment.
>>>>>>> 
>>>>>>>> Also, what happens if a PR is updated? Will the tests run for both
>> or
>>>> just
>>>>>>>> for the latest version?
>>>>>>> It will trigger a new build - if there is already a build in progress
>>>> that will prevent a new build from starting until it finishes...and
>> there
>>>> is also a 5 builds/day
>>>>>>> limit; which might induce some wait.
>>>>>>> 
>>>>>>> cheers,
>>>>>>> Zoltan
>>>>>>> 
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Zoltan
>>>>>>>> 
>>>>>>>> On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich <ki...@rxd.hu>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hello all!
>>>>>>>>> 
>>>>>>>>> The proposed system have become more stable lately - and I think
>> I've
>>>>>>>>> solved a few sources of flakiness.
>>>>>>>>> To be really usable I also wanted to add a way to dynamically
>>>>>>>>> enable/disable a set of tests (for example the replication tests
>>>> take ~7
>>>>>>>>> hours to execute from the total of 24
>>>>>>>>> hours - and they are also a bit unstable, so not running them when
>>>> not
>>>>>>>>> neccesary would be beneficial in multiple ways) - but to do this
>> the
>>>> best
>>>>>>>>> would be to throw in
>>>>>>>>> junit5; unfortunately the current ptest installation uses maven
>> 3.0.5
>>>>>>>>> which doesn't like these kind of things - so instead of hacking a
>>>> fix for
>>>>>>>>> that ....I've removed it
>>>>>>>>> from the dev branch for now.
>>>>>>>>> 
>>>>>>>>> I would like to propose to start an evaluation phase of the new
>> test
>>>>>>>>> procedures(INFRA-20269)
>>>>>>>>> The process would look something like this:
>>>>>>>>> * someone opens a PR - the tests will be run on the changes
>>>>>>>>> * on every active branches the tests will run from time to time
>>>>>>>>>     * this will produce a bunch of test runs on the master branch
>> as
>>>> well ;
>>>>>>>>> which will show how well the tests behave on the master branch
>>>> without any
>>>>>>>>> patches
>>>>>>>>> * runs on branches (PRs or active development branches(eg:master))
>>>> will be
>>>>>>>>> rate limited to 5 builds/day
>>>>>>>>> * at most ~4 builds at a time - to maximize resource usage
>>>>>>>>> * turnaround time for a build is right now 2 hours - which I feel
>>>> like a
>>>>>>>>> balanced choice between speed/response time
>>>>>>>>> 
>>>>>>>>> Possible future benefits:
>>>>>>>>> * toggle features using github tags
>>>>>>>>> * optional testgroups (metastore/replication) tests
>>>>>>>>> * ability to run the metastore verification tests
>>>>>>>>> * possibility to add smoke tests
>>>>>>>>> 
>>>>>>>>> To enable this I will have to finish the HIVE-22942 ticket - beyond
>>>> the
>>>>>>>>> new Jenkinsfile which defines the full logic;
>>>>>>>>> although I've sinked a lot of time into fixing all kind of flaky
>>>> tests I
>>>>>>>>> would would like to disable around ~25 tests.
>>>>>>>>> 
>>>>>>>>> I also would like to propose a method to verify the stability of a
>>>> single
>>>>>>>>> test: run it a 100 times in series at the same place where the
>>>> precommit
>>>>>>>>> tests are running.
>>>>>>>>> This will put the bar high enough that only totally stable tests
>>>> could
>>>>>>>>> satisfy it (a 99% stable test has 36% chance to pass this without
>>>> being
>>>>>>>>> caught :D)
>>>>>>>>> After this will be in service it could be used to: validate that an
>>>>>>>>> existing test is unstable (before disabling it) - and then used
>>>> again to
>>>>>>>>> prove that it got fixed during
>>>>>>>>> re-enabling it.
>>>>>>>>> 
>>>>>>>>> Please let me know what you think!
>>>>>>>>> 
>>>>>>>>> cheers,
>>>>>>>>> Zoltan
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 4/29/20 4:28 PM, Zoltan Haindrich wrote:
>>>>>>>>>> Hey All!
>>>>>>>>>> 
>>>>>>>>>> I was planning to replace the ptest stuff with something less
>>>> complex
>>>>>>>>> for a while now - I see that we struggle a lot because of ptest is
>>>> more
>>>>>>>>> complicated than it should be...
>>>>>>>>>> It would be much better if it would be constructed from well made
>>>>>>>>> existing CI piece. - because of that I've started working on [1] a
>>>> few
>>>>>>>>> months ago.
>>>>>>>>>> 
>>>>>>>>>> It has it's pros and cons...but it's not the same as the existing
>>>> ptest
>>>>>>>>> stuff.
>>>>>>>>>> I've collected some infos about how it compares against the
>>>> existing one
>>>>>>>>> - but it became too long so I've moved it into a google docs
>>>> document at
>>>>>>>>> [3].
>>>>>>>>>> 
>>>>>>>>>> It's not yet ready... I still have some remaining
>>>> problems/concerns/etc
>>>>>>>>>> * what do you think about changing to a github PR based workflow?
>>>>>>>>>> * it will not support at all things like "isolation" - so we will
>>>> have
>>>>>>>>> to make our tests work with eachother without bending the rules...
>>>>>>>>>> * I've tried to overcommit the cpu resources which creates a more
>>>> noisy
>>>>>>>>> environment for the actual tests - this squeezes out some new
>>>> problems
>>>>>>>>> which should be fixed before
>>>>>>>>>> this could be enabled.
>>>>>>>>>> * for every PR the first run is somewhat sub-optimal...there are
>>>> some
>>>>>>>>> reasons for this - the actually used resources are the same; but
>> the
>>>>>>>>> overall execution time is not
>>>>>>>>>> optimal; I could accept this as a compromise because right now I
>>>> wait
>>>>>>>>>> 24 hours for a precommit run.
>>>>>>>>>> 
>>>>>>>>>> It's deployed at [2] and anyone can start a testrun on it:
>>>>>>>>>> * merge my HIVE-22942-ptest-alt branch from [4] into your branch
>>>>>>>>>> * open a PR against my hive repo on github [5]
>>>>>>>>>> 
>>>>>>>>>> cheers,
>>>>>>>>>> Zoltan
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> [1] https://issues.apache.org/jira/browse/HIVE-22942
>>>>>>>>>> [2] http://34.66.156.144:8080/job/hive-precommit
>>>>>>>>>> [3]
>>>>>>>>> 
>>>> 
>> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing
>>>>>>>>>> [4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt
>>>>>>>>>> [5] https://github.com/kgyrtkirk/hive/
>>>>>>>>> 
>>>>>>>> 
>>>> 
>>> 
>>

Re: [DISCUSS] Replace ptest with hive-test-kube

Posted by Stamatis Zampetakis <za...@gmail.com>.

+1 for failing fast starting with findbugs and eventually covering the
important points from checkstyle.

Bes,
Stamatis

On Fri, Jun 5, 2020 at 9:35 AM Zoltan Haindrich <ki...@rxd.hu> wrote:

>
> Hey Mustafa!
>
> Those checks are not executed anymore in the new system. I always feeled
> it a bit confusing to have a comment which reports about
> checkstyle/finbugs/etc issues; while
> getting a green test run was almost impossible due to the high number of
> randomly falling tests.
> So I don't think it's viable that someone will re-submit the patch with
> style changes.
>
> I think the old approach is a "soft" way of enforcing code quality - in
> which I personally don't believe: code quality should be enforced by
> rules/quality gates/etc.
>
> So I would like to take a different approach...I think we should definetly
> re-introduce these checks - however without "tolerance" being built-in.
> This will most likely
> mean that we will have to soften the ruleset first; but then we may
> gradually increase the bar to a higher level.
>
> The "without tolerance" will mean that this will be checked during (or
> right after) the build phase - so if you make quality mistakes you will
> just not get a test results
> (it will also save resources).
>
> Yesterday Laszlo have opened a ticket about fixing findbugs issues - if we
> do fix those issues; but we never enforce to fail the build - someone might
> just add a few more.
>
> To increase code quality through out the project I think we could take a
> bottom-up approach:
> * first patch:
>    * fix things in a low level module(like common or storage-api)
>    * it should also add the neccessary maven&job changes to enforce things
> during precommit (up-to that module)
> * followups:
>    * raise the bar to higher level modules
>
> Obviously we can't do this for something like checkstyle which detects a
> myriad of small issues:
> * the ruleset should be shrinked to something which needs reasonable
> amount of work to start enforcing
> * later we can enable further rules/fix all of them in the project
>
> What do you think about this?
>
> cheers,
> Zoltan
>
>
>
> On 6/5/20 2:47 AM, Mustafa IMAN wrote:
> > Thank you Zoltan for all this work.
> > I see many PRs are merged based on the new workflow already. The old
> > workflow generates many reports like ASF license/findbugs/checkstyle
> etc. I
> > don't see these in the new Github PR workflow. I am concerned the
> codebase
> > is going to suffer from lack of these reports very quickly. Do these
> checks
> > still happen but are not visible?
> >
> > On Tue, Jun 2, 2020 at 4:41 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
> >
> >> Hello,
> >>
> >> I would like to note that you may login to the jenkins instance - to
> >> start/kill builds (or create new jobs).
> >> I've configured github oauth - but since team membership can't be
> queried
> >> from the "apache organization" - it's harder to configure all "hive
> >> committers".
> >> However...I think I've made it available for most of us - if you can't
> >> start builds/etc just let me know your github user and I'll add it.
> >>
> >> cheers,
> >> Zoltan
> >>
> >>
> >>
> >> On 5/29/20 2:32 PM, Zoltan Haindrich wrote:
> >>> Hey all!
> >>>
> >>> The patch is now in master - so every new PR or a push on it will
> >> trigger a new run.
> >>>
> >>> Please decide which one would you like to use - open a PR to see the
> new
> >> one work...or upload a patch file to the jira - but please don't do
> both;
> >> because in that case 2
> >>> execution will happen.
> >>>
> >>> The job execution time(2-4 hours) of a single run is a bit higher than
> >> the usual on the ptest server - this is mostly to increase throughput.
> >>>
> >>> The patch also disabled a set of tests; I will send the full list of
> >> skipped tests shortly.
> >>>
> >>> cheers,
> >>> Zoltan
> >>>
> >>>
> >>> On 5/27/20 1:50 PM, Zoltan Haindrich wrote:
> >>>> Hello all!
> >>>>
> >>>> The new stuff is ready to be switched on-to. It needs to be merged
> into
> >> master - and after that anyone who opens a PR will get a run by the new
> >> HiveQA infra.
> >>>> I propose to run the 2 systems side-by-side for some time - the
> regular
> >> master builds will start; and we will see how frequently that is
> polluted
> >> by flaky tests.
> >>>>
> >>>> Note that the current patch also disables around ~25 more tests to
> >> increase stability - to get a better overview about the disabled tests I
> >> think the "direction of the
> >>>> information flow" should be altered; what I mean by that is: instead
> of
> >> just throwing in a jira for "disable test x" and opening a new one like
> >> "fix test x"; only open
> >>>> the latter and place the jira reference into the ignore message;
> >> meanwhile also add a regular report about the actually disabled tests -
> so
> >> people who do know about the
> >>>> importance of a particular test can get involved.
> >>>>
> >>>> Note: the builds.apache.org instance will be shutdown somewhere in
> the
> >> future as well...but I think the new one is a good-enough alternative to
> >> not have to migrate the
> >>>> Hive-precommit job over to https://ci-hadoop.apache.org/.
> >>>>
> >>>> http://34.66.156.144:8080/job/hive-precommit/job/PR-948/5/
> >>>> https://issues.apache.org/jira/browse/HIVE-22942
> >>>> https://github.com/apache/hive/pull/948/files
> >>>>
> >>>> cheers,
> >>>> Zoltan
> >>>>
> >>>> On 5/18/20 1:42 PM, Zoltan Haindrich wrote:
> >>>>> Hey!
> >>>>>
> >>>>> On 5/18/20 11:51 AM, Zoltan Chovan wrote:
> >>>>>> Thank you for all of your efforts, this looks really promising. With
> >> moving
> >>>>>> to github PRs, would that also mean that we move away from the
> >> reviewboard
> >>>>>> for code review?
> >>>>> I didn't thinked about that. I think using github's review interface
> >> will remain optional, because both review systems has there own strong
> >> points - I wouldn't force
> >>>>> anyone to use one over the other. (For some patches reviewboard is
> >> much better; because it's able to track content moves a bit better than
> >> github. - meanwhile github has
> >>>>> a small feature that enables to mark files as reviewed)
> >>>>> As a matter of fact we had sometimes patches on the jira's which
> never
> >> had neither an RB or a PR to review them - having a PR there at least
> will
> >> make it easier for
> >>>>> reviewers to comment.
> >>>>>
> >>>>>> Also, what happens if a PR is updated? Will the tests run for both
> or
> >> just
> >>>>>> for the latest version?
> >>>>> It will trigger a new build - if there is already a build in progress
> >> that will prevent a new build from starting until it finishes...and
> there
> >> is also a 5 builds/day
> >>>>> limit; which might induce some wait.
> >>>>>
> >>>>> cheers,
> >>>>> Zoltan
> >>>>>
> >>>>>>
> >>>>>> Regards,
> >>>>>> Zoltan
> >>>>>>
> >>>>>> On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich <ki...@rxd.hu>
> >> wrote:
> >>>>>>
> >>>>>>> Hello all!
> >>>>>>>
> >>>>>>> The proposed system have become more stable lately - and I think
> I've
> >>>>>>> solved a few sources of flakiness.
> >>>>>>> To be really usable I also wanted to add a way to dynamically
> >>>>>>> enable/disable a set of tests (for example the replication tests
> >> take ~7
> >>>>>>> hours to execute from the total of 24
> >>>>>>> hours - and they are also a bit unstable, so not running them when
> >> not
> >>>>>>> neccesary would be beneficial in multiple ways) - but to do this
> the
> >> best
> >>>>>>> would be to throw in
> >>>>>>> junit5; unfortunately the current ptest installation uses maven
> 3.0.5
> >>>>>>> which doesn't like these kind of things - so instead of hacking a
> >> fix for
> >>>>>>> that ....I've removed it
> >>>>>>> from the dev branch for now.
> >>>>>>>
> >>>>>>> I would like to propose to start an evaluation phase of the new
> test
> >>>>>>> procedures(INFRA-20269)
> >>>>>>> The process would look something like this:
> >>>>>>> * someone opens a PR - the tests will be run on the changes
> >>>>>>> * on every active branches the tests will run from time to time
> >>>>>>>      * this will produce a bunch of test runs on the master branch
> as
> >> well ;
> >>>>>>> which will show how well the tests behave on the master branch
> >> without any
> >>>>>>> patches
> >>>>>>> * runs on branches (PRs or active development branches(eg:master))
> >> will be
> >>>>>>> rate limited to 5 builds/day
> >>>>>>> * at most ~4 builds at a time - to maximize resource usage
> >>>>>>> * turnaround time for a build is right now 2 hours - which I feel
> >> like a
> >>>>>>> balanced choice between speed/response time
> >>>>>>>
> >>>>>>> Possible future benefits:
> >>>>>>> * toggle features using github tags
> >>>>>>> * optional testgroups (metastore/replication) tests
> >>>>>>> * ability to run the metastore verification tests
> >>>>>>> * possibility to add smoke tests
> >>>>>>>
> >>>>>>> To enable this I will have to finish the HIVE-22942 ticket - beyond
> >> the
> >>>>>>> new Jenkinsfile which defines the full logic;
> >>>>>>> although I've sinked a lot of time into fixing all kind of flaky
> >> tests I
> >>>>>>> would would like to disable around ~25 tests.
> >>>>>>>
> >>>>>>> I also would like to propose a method to verify the stability of a
> >> single
> >>>>>>> test: run it a 100 times in series at the same place where the
> >> precommit
> >>>>>>> tests are running.
> >>>>>>> This will put the bar high enough that only totally stable tests
> >> could
> >>>>>>> satisfy it (a 99% stable test has 36% chance to pass this without
> >> being
> >>>>>>> caught :D)
> >>>>>>> After this will be in service it could be used to: validate that an
> >>>>>>> existing test is unstable (before disabling it) - and then used
> >> again to
> >>>>>>> prove that it got fixed during
> >>>>>>> re-enabling it.
> >>>>>>>
> >>>>>>> Please let me know what you think!
> >>>>>>>
> >>>>>>> cheers,
> >>>>>>> Zoltan
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 4/29/20 4:28 PM, Zoltan Haindrich wrote:
> >>>>>>>> Hey All!
> >>>>>>>>
> >>>>>>>> I was planning to replace the ptest stuff with something less
> >> complex
> >>>>>>> for a while now - I see that we struggle a lot because of ptest is
> >> more
> >>>>>>> complicated than it should be...
> >>>>>>>> It would be much better if it would be constructed from well made
> >>>>>>> existing CI piece. - because of that I've started working on [1] a
> >> few
> >>>>>>> months ago.
> >>>>>>>>
> >>>>>>>> It has it's pros and cons...but it's not the same as the existing
> >> ptest
> >>>>>>> stuff.
> >>>>>>>> I've collected some infos about how it compares against the
> >> existing one
> >>>>>>> - but it became too long so I've moved it into a google docs
> >> document at
> >>>>>>> [3].
> >>>>>>>>
> >>>>>>>> It's not yet ready... I still have some remaining
> >> problems/concerns/etc
> >>>>>>>> * what do you think about changing to a github PR based workflow?
> >>>>>>>> * it will not support at all things like "isolation" - so we will
> >> have
> >>>>>>> to make our tests work with eachother without bending the rules...
> >>>>>>>> * I've tried to overcommit the cpu resources which creates a more
> >> noisy
> >>>>>>> environment for the actual tests - this squeezes out some new
> >> problems
> >>>>>>> which should be fixed before
> >>>>>>>> this could be enabled.
> >>>>>>>> * for every PR the first run is somewhat sub-optimal...there are
> >> some
> >>>>>>> reasons for this - the actually used resources are the same; but
> the
> >>>>>>> overall execution time is not
> >>>>>>>> optimal; I could accept this as a compromise because right now I
> >> wait
> >>>>>>>> 24 hours for a precommit run.
> >>>>>>>>
> >>>>>>>> It's deployed at [2] and anyone can start a testrun on it:
> >>>>>>>> * merge my HIVE-22942-ptest-alt branch from [4] into your branch
> >>>>>>>> * open a PR against my hive repo on github [5]
> >>>>>>>>
> >>>>>>>> cheers,
> >>>>>>>> Zoltan
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> [1] https://issues.apache.org/jira/browse/HIVE-22942
> >>>>>>>> [2] http://34.66.156.144:8080/job/hive-precommit
> >>>>>>>> [3]
> >>>>>>>
> >>
> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing
> >>>>>>>> [4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt
> >>>>>>>> [5] https://github.com/kgyrtkirk/hive/
> >>>>>>>
> >>>>>>
> >>
> >
>

Re: [DISCUSS] Replace ptest with hive-test-kube

Posted by Zoltan Haindrich <ki...@rxd.hu>.

Hey Mustafa!

Those checks are not executed anymore in the new system. I always feeled it a bit confusing to have a comment which reports about checkstyle/finbugs/etc issues; while 
getting a green test run was almost impossible due to the high number of randomly falling tests.
So I don't think it's viable that someone will re-submit the patch with style changes.

I think the old approach is a "soft" way of enforcing code quality - in which I personally don't believe: code quality should be enforced by rules/quality gates/etc.

So I would like to take a different approach...I think we should definetly re-introduce these checks - however without "tolerance" being built-in. This will most likely 
mean that we will have to soften the ruleset first; but then we may gradually increase the bar to a higher level.

The "without tolerance" will mean that this will be checked during (or right after) the build phase - so if you make quality mistakes you will just not get a test results 
(it will also save resources).

Yesterday Laszlo have opened a ticket about fixing findbugs issues - if we do fix those issues; but we never enforce to fail the build - someone might just add a few more.

To increase code quality through out the project I think we could take a bottom-up approach:
* first patch:
   * fix things in a low level module(like common or storage-api)
   * it should also add the neccessary maven&job changes to enforce things during precommit (up-to that module)
* followups:
   * raise the bar to higher level modules

Obviously we can't do this for something like checkstyle which detects a myriad of small issues:
* the ruleset should be shrinked to something which needs reasonable amount of work to start enforcing
* later we can enable further rules/fix all of them in the project

What do you think about this?

cheers,
Zoltan



On 6/5/20 2:47 AM, Mustafa IMAN wrote:
> Thank you Zoltan for all this work.
> I see many PRs are merged based on the new workflow already. The old
> workflow generates many reports like ASF license/findbugs/checkstyle etc. I
> don't see these in the new Github PR workflow. I am concerned the codebase
> is going to suffer from lack of these reports very quickly. Do these checks
> still happen but are not visible?
> 
> On Tue, Jun 2, 2020 at 4:41 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
> 
>> Hello,
>>
>> I would like to note that you may login to the jenkins instance - to
>> start/kill builds (or create new jobs).
>> I've configured github oauth - but since team membership can't be queried
>> from the "apache organization" - it's harder to configure all "hive
>> committers".
>> However...I think I've made it available for most of us - if you can't
>> start builds/etc just let me know your github user and I'll add it.
>>
>> cheers,
>> Zoltan
>>
>>
>>
>> On 5/29/20 2:32 PM, Zoltan Haindrich wrote:
>>> Hey all!
>>>
>>> The patch is now in master - so every new PR or a push on it will
>> trigger a new run.
>>>
>>> Please decide which one would you like to use - open a PR to see the new
>> one work...or upload a patch file to the jira - but please don't do both;
>> because in that case 2
>>> execution will happen.
>>>
>>> The job execution time(2-4 hours) of a single run is a bit higher than
>> the usual on the ptest server - this is mostly to increase throughput.
>>>
>>> The patch also disabled a set of tests; I will send the full list of
>> skipped tests shortly.
>>>
>>> cheers,
>>> Zoltan
>>>
>>>
>>> On 5/27/20 1:50 PM, Zoltan Haindrich wrote:
>>>> Hello all!
>>>>
>>>> The new stuff is ready to be switched on-to. It needs to be merged into
>> master - and after that anyone who opens a PR will get a run by the new
>> HiveQA infra.
>>>> I propose to run the 2 systems side-by-side for some time - the regular
>> master builds will start; and we will see how frequently that is polluted
>> by flaky tests.
>>>>
>>>> Note that the current patch also disables around ~25 more tests to
>> increase stability - to get a better overview about the disabled tests I
>> think the "direction of the
>>>> information flow" should be altered; what I mean by that is: instead of
>> just throwing in a jira for "disable test x" and opening a new one like
>> "fix test x"; only open
>>>> the latter and place the jira reference into the ignore message;
>> meanwhile also add a regular report about the actually disabled tests - so
>> people who do know about the
>>>> importance of a particular test can get involved.
>>>>
>>>> Note: the builds.apache.org instance will be shutdown somewhere in the
>> future as well...but I think the new one is a good-enough alternative to
>> not have to migrate the
>>>> Hive-precommit job over to https://ci-hadoop.apache.org/.
>>>>
>>>> http://34.66.156.144:8080/job/hive-precommit/job/PR-948/5/
>>>> https://issues.apache.org/jira/browse/HIVE-22942
>>>> https://github.com/apache/hive/pull/948/files
>>>>
>>>> cheers,
>>>> Zoltan
>>>>
>>>> On 5/18/20 1:42 PM, Zoltan Haindrich wrote:
>>>>> Hey!
>>>>>
>>>>> On 5/18/20 11:51 AM, Zoltan Chovan wrote:
>>>>>> Thank you for all of your efforts, this looks really promising. With
>> moving
>>>>>> to github PRs, would that also mean that we move away from the
>> reviewboard
>>>>>> for code review?
>>>>> I didn't thinked about that. I think using github's review interface
>> will remain optional, because both review systems has there own strong
>> points - I wouldn't force
>>>>> anyone to use one over the other. (For some patches reviewboard is
>> much better; because it's able to track content moves a bit better than
>> github. - meanwhile github has
>>>>> a small feature that enables to mark files as reviewed)
>>>>> As a matter of fact we had sometimes patches on the jira's which never
>> had neither an RB or a PR to review them - having a PR there at least will
>> make it easier for
>>>>> reviewers to comment.
>>>>>
>>>>>> Also, what happens if a PR is updated? Will the tests run for both or
>> just
>>>>>> for the latest version?
>>>>> It will trigger a new build - if there is already a build in progress
>> that will prevent a new build from starting until it finishes...and there
>> is also a 5 builds/day
>>>>> limit; which might induce some wait.
>>>>>
>>>>> cheers,
>>>>> Zoltan
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Zoltan
>>>>>>
>>>>>> On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich <ki...@rxd.hu>
>> wrote:
>>>>>>
>>>>>>> Hello all!
>>>>>>>
>>>>>>> The proposed system have become more stable lately - and I think I've
>>>>>>> solved a few sources of flakiness.
>>>>>>> To be really usable I also wanted to add a way to dynamically
>>>>>>> enable/disable a set of tests (for example the replication tests
>> take ~7
>>>>>>> hours to execute from the total of 24
>>>>>>> hours - and they are also a bit unstable, so not running them when
>> not
>>>>>>> neccesary would be beneficial in multiple ways) - but to do this the
>> best
>>>>>>> would be to throw in
>>>>>>> junit5; unfortunately the current ptest installation uses maven 3.0.5
>>>>>>> which doesn't like these kind of things - so instead of hacking a
>> fix for
>>>>>>> that ....I've removed it
>>>>>>> from the dev branch for now.
>>>>>>>
>>>>>>> I would like to propose to start an evaluation phase of the new test
>>>>>>> procedures(INFRA-20269)
>>>>>>> The process would look something like this:
>>>>>>> * someone opens a PR - the tests will be run on the changes
>>>>>>> * on every active branches the tests will run from time to time
>>>>>>>      * this will produce a bunch of test runs on the master branch as
>> well ;
>>>>>>> which will show how well the tests behave on the master branch
>> without any
>>>>>>> patches
>>>>>>> * runs on branches (PRs or active development branches(eg:master))
>> will be
>>>>>>> rate limited to 5 builds/day
>>>>>>> * at most ~4 builds at a time - to maximize resource usage
>>>>>>> * turnaround time for a build is right now 2 hours - which I feel
>> like a
>>>>>>> balanced choice between speed/response time
>>>>>>>
>>>>>>> Possible future benefits:
>>>>>>> * toggle features using github tags
>>>>>>> * optional testgroups (metastore/replication) tests
>>>>>>> * ability to run the metastore verification tests
>>>>>>> * possibility to add smoke tests
>>>>>>>
>>>>>>> To enable this I will have to finish the HIVE-22942 ticket - beyond
>> the
>>>>>>> new Jenkinsfile which defines the full logic;
>>>>>>> although I've sinked a lot of time into fixing all kind of flaky
>> tests I
>>>>>>> would would like to disable around ~25 tests.
>>>>>>>
>>>>>>> I also would like to propose a method to verify the stability of a
>> single
>>>>>>> test: run it a 100 times in series at the same place where the
>> precommit
>>>>>>> tests are running.
>>>>>>> This will put the bar high enough that only totally stable tests
>> could
>>>>>>> satisfy it (a 99% stable test has 36% chance to pass this without
>> being
>>>>>>> caught :D)
>>>>>>> After this will be in service it could be used to: validate that an
>>>>>>> existing test is unstable (before disabling it) - and then used
>> again to
>>>>>>> prove that it got fixed during
>>>>>>> re-enabling it.
>>>>>>>
>>>>>>> Please let me know what you think!
>>>>>>>
>>>>>>> cheers,
>>>>>>> Zoltan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 4/29/20 4:28 PM, Zoltan Haindrich wrote:
>>>>>>>> Hey All!
>>>>>>>>
>>>>>>>> I was planning to replace the ptest stuff with something less
>> complex
>>>>>>> for a while now - I see that we struggle a lot because of ptest is
>> more
>>>>>>> complicated than it should be...
>>>>>>>> It would be much better if it would be constructed from well made
>>>>>>> existing CI piece. - because of that I've started working on [1] a
>> few
>>>>>>> months ago.
>>>>>>>>
>>>>>>>> It has it's pros and cons...but it's not the same as the existing
>> ptest
>>>>>>> stuff.
>>>>>>>> I've collected some infos about how it compares against the
>> existing one
>>>>>>> - but it became too long so I've moved it into a google docs
>> document at
>>>>>>> [3].
>>>>>>>>
>>>>>>>> It's not yet ready... I still have some remaining
>> problems/concerns/etc
>>>>>>>> * what do you think about changing to a github PR based workflow?
>>>>>>>> * it will not support at all things like "isolation" - so we will
>> have
>>>>>>> to make our tests work with eachother without bending the rules...
>>>>>>>> * I've tried to overcommit the cpu resources which creates a more
>> noisy
>>>>>>> environment for the actual tests - this squeezes out some new
>> problems
>>>>>>> which should be fixed before
>>>>>>>> this could be enabled.
>>>>>>>> * for every PR the first run is somewhat sub-optimal...there are
>> some
>>>>>>> reasons for this - the actually used resources are the same; but the
>>>>>>> overall execution time is not
>>>>>>>> optimal; I could accept this as a compromise because right now I
>> wait
>>>>>>>> 24 hours for a precommit run.
>>>>>>>>
>>>>>>>> It's deployed at [2] and anyone can start a testrun on it:
>>>>>>>> * merge my HIVE-22942-ptest-alt branch from [4] into your branch
>>>>>>>> * open a PR against my hive repo on github [5]
>>>>>>>>
>>>>>>>> cheers,
>>>>>>>> Zoltan
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] https://issues.apache.org/jira/browse/HIVE-22942
>>>>>>>> [2] http://34.66.156.144:8080/job/hive-precommit
>>>>>>>> [3]
>>>>>>>
>> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing
>>>>>>>> [4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt
>>>>>>>> [5] https://github.com/kgyrtkirk/hive/
>>>>>>>
>>>>>>
>>
>

Re: [DISCUSS] Replace ptest with hive-test-kube

Posted by Mustafa IMAN <mu...@cloudera.com.INVALID>.

Thank you Zoltan for all this work.
I see many PRs are merged based on the new workflow already. The old
workflow generates many reports like ASF license/findbugs/checkstyle etc. I
don't see these in the new Github PR workflow. I am concerned the codebase
is going to suffer from lack of these reports very quickly. Do these checks
still happen but are not visible?

On Tue, Jun 2, 2020 at 4:41 AM Zoltan Haindrich <ki...@rxd.hu> wrote:

> Hello,
>
> I would like to note that you may login to the jenkins instance - to
> start/kill builds (or create new jobs).
> I've configured github oauth - but since team membership can't be queried
> from the "apache organization" - it's harder to configure all "hive
> committers".
> However...I think I've made it available for most of us - if you can't
> start builds/etc just let me know your github user and I'll add it.
>
> cheers,
> Zoltan
>
>
>
> On 5/29/20 2:32 PM, Zoltan Haindrich wrote:
> > Hey all!
> >
> > The patch is now in master - so every new PR or a push on it will
> trigger a new run.
> >
> > Please decide which one would you like to use - open a PR to see the new
> one work...or upload a patch file to the jira - but please don't do both;
> because in that case 2
> > execution will happen.
> >
> > The job execution time(2-4 hours) of a single run is a bit higher than
> the usual on the ptest server - this is mostly to increase throughput.
> >
> > The patch also disabled a set of tests; I will send the full list of
> skipped tests shortly.
> >
> > cheers,
> > Zoltan
> >
> >
> > On 5/27/20 1:50 PM, Zoltan Haindrich wrote:
> >> Hello all!
> >>
> >> The new stuff is ready to be switched on-to. It needs to be merged into
> master - and after that anyone who opens a PR will get a run by the new
> HiveQA infra.
> >> I propose to run the 2 systems side-by-side for some time - the regular
> master builds will start; and we will see how frequently that is polluted
> by flaky tests.
> >>
> >> Note that the current patch also disables around ~25 more tests to
> increase stability - to get a better overview about the disabled tests I
> think the "direction of the
> >> information flow" should be altered; what I mean by that is: instead of
> just throwing in a jira for "disable test x" and opening a new one like
> "fix test x"; only open
> >> the latter and place the jira reference into the ignore message;
> meanwhile also add a regular report about the actually disabled tests - so
> people who do know about the
> >> importance of a particular test can get involved.
> >>
> >> Note: the builds.apache.org instance will be shutdown somewhere in the
> future as well...but I think the new one is a good-enough alternative to
> not have to migrate the
> >> Hive-precommit job over to https://ci-hadoop.apache.org/.
> >>
> >> http://34.66.156.144:8080/job/hive-precommit/job/PR-948/5/
> >> https://issues.apache.org/jira/browse/HIVE-22942
> >> https://github.com/apache/hive/pull/948/files
> >>
> >> cheers,
> >> Zoltan
> >>
> >> On 5/18/20 1:42 PM, Zoltan Haindrich wrote:
> >>> Hey!
> >>>
> >>> On 5/18/20 11:51 AM, Zoltan Chovan wrote:
> >>>> Thank you for all of your efforts, this looks really promising. With
> moving
> >>>> to github PRs, would that also mean that we move away from the
> reviewboard
> >>>> for code review?
> >>> I didn't thinked about that. I think using github's review interface
> will remain optional, because both review systems has there own strong
> points - I wouldn't force
> >>> anyone to use one over the other. (For some patches reviewboard is
> much better; because it's able to track content moves a bit better than
> github. - meanwhile github has
> >>> a small feature that enables to mark files as reviewed)
> >>> As a matter of fact we had sometimes patches on the jira's which never
> had neither an RB or a PR to review them - having a PR there at least will
> make it easier for
> >>> reviewers to comment.
> >>>
> >>>> Also, what happens if a PR is updated? Will the tests run for both or
> just
> >>>> for the latest version?
> >>> It will trigger a new build - if there is already a build in progress
> that will prevent a new build from starting until it finishes...and there
> is also a 5 builds/day
> >>> limit; which might induce some wait.
> >>>
> >>> cheers,
> >>> Zoltan
> >>>
> >>>>
> >>>> Regards,
> >>>> Zoltan
> >>>>
> >>>> On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich <ki...@rxd.hu>
> wrote:
> >>>>
> >>>>> Hello all!
> >>>>>
> >>>>> The proposed system have become more stable lately - and I think I've
> >>>>> solved a few sources of flakiness.
> >>>>> To be really usable I also wanted to add a way to dynamically
> >>>>> enable/disable a set of tests (for example the replication tests
> take ~7
> >>>>> hours to execute from the total of 24
> >>>>> hours - and they are also a bit unstable, so not running them when
> not
> >>>>> neccesary would be beneficial in multiple ways) - but to do this the
> best
> >>>>> would be to throw in
> >>>>> junit5; unfortunately the current ptest installation uses maven 3.0.5
> >>>>> which doesn't like these kind of things - so instead of hacking a
> fix for
> >>>>> that ....I've removed it
> >>>>> from the dev branch for now.
> >>>>>
> >>>>> I would like to propose to start an evaluation phase of the new test
> >>>>> procedures(INFRA-20269)
> >>>>> The process would look something like this:
> >>>>> * someone opens a PR - the tests will be run on the changes
> >>>>> * on every active branches the tests will run from time to time
> >>>>>     * this will produce a bunch of test runs on the master branch as
> well ;
> >>>>> which will show how well the tests behave on the master branch
> without any
> >>>>> patches
> >>>>> * runs on branches (PRs or active development branches(eg:master))
> will be
> >>>>> rate limited to 5 builds/day
> >>>>> * at most ~4 builds at a time - to maximize resource usage
> >>>>> * turnaround time for a build is right now 2 hours - which I feel
> like a
> >>>>> balanced choice between speed/response time
> >>>>>
> >>>>> Possible future benefits:
> >>>>> * toggle features using github tags
> >>>>> * optional testgroups (metastore/replication) tests
> >>>>> * ability to run the metastore verification tests
> >>>>> * possibility to add smoke tests
> >>>>>
> >>>>> To enable this I will have to finish the HIVE-22942 ticket - beyond
> the
> >>>>> new Jenkinsfile which defines the full logic;
> >>>>> although I've sinked a lot of time into fixing all kind of flaky
> tests I
> >>>>> would would like to disable around ~25 tests.
> >>>>>
> >>>>> I also would like to propose a method to verify the stability of a
> single
> >>>>> test: run it a 100 times in series at the same place where the
> precommit
> >>>>> tests are running.
> >>>>> This will put the bar high enough that only totally stable tests
> could
> >>>>> satisfy it (a 99% stable test has 36% chance to pass this without
> being
> >>>>> caught :D)
> >>>>> After this will be in service it could be used to: validate that an
> >>>>> existing test is unstable (before disabling it) - and then used
> again to
> >>>>> prove that it got fixed during
> >>>>> re-enabling it.
> >>>>>
> >>>>> Please let me know what you think!
> >>>>>
> >>>>> cheers,
> >>>>> Zoltan
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 4/29/20 4:28 PM, Zoltan Haindrich wrote:
> >>>>>> Hey All!
> >>>>>>
> >>>>>> I was planning to replace the ptest stuff with something less
> complex
> >>>>> for a while now - I see that we struggle a lot because of ptest is
> more
> >>>>> complicated than it should be...
> >>>>>> It would be much better if it would be constructed from well made
> >>>>> existing CI piece. - because of that I've started working on [1] a
> few
> >>>>> months ago.
> >>>>>>
> >>>>>> It has it's pros and cons...but it's not the same as the existing
> ptest
> >>>>> stuff.
> >>>>>> I've collected some infos about how it compares against the
> existing one
> >>>>> - but it became too long so I've moved it into a google docs
> document at
> >>>>> [3].
> >>>>>>
> >>>>>> It's not yet ready... I still have some remaining
> problems/concerns/etc
> >>>>>> * what do you think about changing to a github PR based workflow?
> >>>>>> * it will not support at all things like "isolation" - so we will
> have
> >>>>> to make our tests work with eachother without bending the rules...
> >>>>>> * I've tried to overcommit the cpu resources which creates a more
> noisy
> >>>>> environment for the actual tests - this squeezes out some new
> problems
> >>>>> which should be fixed before
> >>>>>> this could be enabled.
> >>>>>> * for every PR the first run is somewhat sub-optimal...there are
> some
> >>>>> reasons for this - the actually used resources are the same; but the
> >>>>> overall execution time is not
> >>>>>> optimal; I could accept this as a compromise because right now I
> wait
> >>>>>> 24 hours for a precommit run.
> >>>>>>
> >>>>>> It's deployed at [2] and anyone can start a testrun on it:
> >>>>>> * merge my HIVE-22942-ptest-alt branch from [4] into your branch
> >>>>>> * open a PR against my hive repo on github [5]
> >>>>>>
> >>>>>> cheers,
> >>>>>> Zoltan
> >>>>>>
> >>>>>>
> >>>>>> [1] https://issues.apache.org/jira/browse/HIVE-22942
> >>>>>> [2] http://34.66.156.144:8080/job/hive-precommit
> >>>>>> [3]
> >>>>>
> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing
> >>>>>> [4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt
> >>>>>> [5] https://github.com/kgyrtkirk/hive/
> >>>>>
> >>>>
>

Re: [DISCUSS] Replace ptest with hive-test-kube

Posted by Zoltan Haindrich <ki...@rxd.hu>.

Hello,

I would like to note that you may login to the jenkins instance - to start/kill builds (or create new jobs).
I've configured github oauth - but since team membership can't be queried from the "apache organization" - it's harder to configure all "hive committers".
However...I think I've made it available for most of us - if you can't start builds/etc just let me know your github user and I'll add it.

cheers,
Zoltan



On 5/29/20 2:32 PM, Zoltan Haindrich wrote:
> Hey all!
> 
> The patch is now in master - so every new PR or a push on it will trigger a new run.
> 
> Please decide which one would you like to use - open a PR to see the new one work...or upload a patch file to the jira - but please don't do both; because in that case 2 
> execution will happen.
> 
> The job execution time(2-4 hours) of a single run is a bit higher than the usual on the ptest server - this is mostly to increase throughput.
> 
> The patch also disabled a set of tests; I will send the full list of skipped tests shortly.
> 
> cheers,
> Zoltan
> 
> 
> On 5/27/20 1:50 PM, Zoltan Haindrich wrote:
>> Hello all!
>>
>> The new stuff is ready to be switched on-to. It needs to be merged into master - and after that anyone who opens a PR will get a run by the new HiveQA infra.
>> I propose to run the 2 systems side-by-side for some time - the regular master builds will start; and we will see how frequently that is polluted by flaky tests.
>>
>> Note that the current patch also disables around ~25 more tests to increase stability - to get a better overview about the disabled tests I think the "direction of the 
>> information flow" should be altered; what I mean by that is: instead of just throwing in a jira for "disable test x" and opening a new one like "fix test x"; only open 
>> the latter and place the jira reference into the ignore message; meanwhile also add a regular report about the actually disabled tests - so people who do know about the 
>> importance of a particular test can get involved.
>>
>> Note: the builds.apache.org instance will be shutdown somewhere in the future as well...but I think the new one is a good-enough alternative to not have to migrate the 
>> Hive-precommit job over to https://ci-hadoop.apache.org/.
>>
>> http://34.66.156.144:8080/job/hive-precommit/job/PR-948/5/
>> https://issues.apache.org/jira/browse/HIVE-22942
>> https://github.com/apache/hive/pull/948/files
>>
>> cheers,
>> Zoltan
>>
>> On 5/18/20 1:42 PM, Zoltan Haindrich wrote:
>>> Hey!
>>>
>>> On 5/18/20 11:51 AM, Zoltan Chovan wrote:
>>>> Thank you for all of your efforts, this looks really promising. With moving
>>>> to github PRs, would that also mean that we move away from the reviewboard
>>>> for code review?
>>> I didn't thinked about that. I think using github's review interface will remain optional, because both review systems has there own strong points - I wouldn't force 
>>> anyone to use one over the other. (For some patches reviewboard is much better; because it's able to track content moves a bit better than github. - meanwhile github has 
>>> a small feature that enables to mark files as reviewed)
>>> As a matter of fact we had sometimes patches on the jira's which never had neither an RB or a PR to review them - having a PR there at least will make it easier for 
>>> reviewers to comment.
>>>
>>>> Also, what happens if a PR is updated? Will the tests run for both or just
>>>> for the latest version?
>>> It will trigger a new build - if there is already a build in progress that will prevent a new build from starting until it finishes...and there is also a 5 builds/day 
>>> limit; which might induce some wait.
>>>
>>> cheers,
>>> Zoltan
>>>
>>>>
>>>> Regards,
>>>> Zoltan
>>>>
>>>> On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich <ki...@rxd.hu> wrote:
>>>>
>>>>> Hello all!
>>>>>
>>>>> The proposed system have become more stable lately - and I think I've
>>>>> solved a few sources of flakiness.
>>>>> To be really usable I also wanted to add a way to dynamically
>>>>> enable/disable a set of tests (for example the replication tests take ~7
>>>>> hours to execute from the total of 24
>>>>> hours - and they are also a bit unstable, so not running them when not
>>>>> neccesary would be beneficial in multiple ways) - but to do this the best
>>>>> would be to throw in
>>>>> junit5; unfortunately the current ptest installation uses maven 3.0.5
>>>>> which doesn't like these kind of things - so instead of hacking a fix for
>>>>> that ....I've removed it
>>>>> from the dev branch for now.
>>>>>
>>>>> I would like to propose to start an evaluation phase of the new test
>>>>> procedures(INFRA-20269)
>>>>> The process would look something like this:
>>>>> * someone opens a PR - the tests will be run on the changes
>>>>> * on every active branches the tests will run from time to time
>>>>>     * this will produce a bunch of test runs on the master branch as well ;
>>>>> which will show how well the tests behave on the master branch without any
>>>>> patches
>>>>> * runs on branches (PRs or active development branches(eg:master)) will be
>>>>> rate limited to 5 builds/day
>>>>> * at most ~4 builds at a time - to maximize resource usage
>>>>> * turnaround time for a build is right now 2 hours - which I feel like a
>>>>> balanced choice between speed/response time
>>>>>
>>>>> Possible future benefits:
>>>>> * toggle features using github tags
>>>>> * optional testgroups (metastore/replication) tests
>>>>> * ability to run the metastore verification tests
>>>>> * possibility to add smoke tests
>>>>>
>>>>> To enable this I will have to finish the HIVE-22942 ticket - beyond the
>>>>> new Jenkinsfile which defines the full logic;
>>>>> although I've sinked a lot of time into fixing all kind of flaky tests I
>>>>> would would like to disable around ~25 tests.
>>>>>
>>>>> I also would like to propose a method to verify the stability of a single
>>>>> test: run it a 100 times in series at the same place where the precommit
>>>>> tests are running.
>>>>> This will put the bar high enough that only totally stable tests could
>>>>> satisfy it (a 99% stable test has 36% chance to pass this without being
>>>>> caught :D)
>>>>> After this will be in service it could be used to: validate that an
>>>>> existing test is unstable (before disabling it) - and then used again to
>>>>> prove that it got fixed during
>>>>> re-enabling it.
>>>>>
>>>>> Please let me know what you think!
>>>>>
>>>>> cheers,
>>>>> Zoltan
>>>>>
>>>>>
>>>>>
>>>>> On 4/29/20 4:28 PM, Zoltan Haindrich wrote:
>>>>>> Hey All!
>>>>>>
>>>>>> I was planning to replace the ptest stuff with something less complex
>>>>> for a while now - I see that we struggle a lot because of ptest is more
>>>>> complicated than it should be...
>>>>>> It would be much better if it would be constructed from well made
>>>>> existing CI piece. - because of that I've started working on [1] a few
>>>>> months ago.
>>>>>>
>>>>>> It has it's pros and cons...but it's not the same as the existing ptest
>>>>> stuff.
>>>>>> I've collected some infos about how it compares against the existing one
>>>>> - but it became too long so I've moved it into a google docs document at
>>>>> [3].
>>>>>>
>>>>>> It's not yet ready... I still have some remaining problems/concerns/etc
>>>>>> * what do you think about changing to a github PR based workflow?
>>>>>> * it will not support at all things like "isolation" - so we will have
>>>>> to make our tests work with eachother without bending the rules...
>>>>>> * I've tried to overcommit the cpu resources which creates a more noisy
>>>>> environment for the actual tests - this squeezes out some new problems
>>>>> which should be fixed before
>>>>>> this could be enabled.
>>>>>> * for every PR the first run is somewhat sub-optimal...there are some
>>>>> reasons for this - the actually used resources are the same; but the
>>>>> overall execution time is not
>>>>>> optimal; I could accept this as a compromise because right now I wait
>>>>>> 24 hours for a precommit run.
>>>>>>
>>>>>> It's deployed at [2] and anyone can start a testrun on it:
>>>>>> * merge my HIVE-22942-ptest-alt branch from [4] into your branch
>>>>>> * open a PR against my hive repo on github [5]
>>>>>>
>>>>>> cheers,
>>>>>> Zoltan
>>>>>>
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/HIVE-22942
>>>>>> [2] http://34.66.156.144:8080/job/hive-precommit
>>>>>> [3]
>>>>> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing
>>>>>> [4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt
>>>>>> [5] https://github.com/kgyrtkirk/hive/
>>>>>
>>>>

Re: [DISCUSS] Replace ptest with hive-test-kube

Posted by Zoltan Haindrich <ki...@rxd.hu>.

Hey all!

The patch is now in master - so every new PR or a push on it will trigger a new run.

Please decide which one would you like to use - open a PR to see the new one work...or upload a patch file to the jira - but please don't do both; because in that case 2 
execution will happen.

The job execution time(2-4 hours) of a single run is a bit higher than the usual on the ptest server - this is mostly to increase throughput.

The patch also disabled a set of tests; I will send the full list of skipped tests shortly.

cheers,
Zoltan


On 5/27/20 1:50 PM, Zoltan Haindrich wrote:
> Hello all!
> 
> The new stuff is ready to be switched on-to. It needs to be merged into master - and after that anyone who opens a PR will get a run by the new HiveQA infra.
> I propose to run the 2 systems side-by-side for some time - the regular master builds will start; and we will see how frequently that is polluted by flaky tests.
> 
> Note that the current patch also disables around ~25 more tests to increase stability - to get a better overview about the disabled tests I think the "direction of the 
> information flow" should be altered; what I mean by that is: instead of just throwing in a jira for "disable test x" and opening a new one like "fix test x"; only open the 
> latter and place the jira reference into the ignore message; meanwhile also add a regular report about the actually disabled tests - so people who do know about the 
> importance of a particular test can get involved.
> 
> Note: the builds.apache.org instance will be shutdown somewhere in the future as well...but I think the new one is a good-enough alternative to not have to migrate the 
> Hive-precommit job over to https://ci-hadoop.apache.org/.
> 
> http://34.66.156.144:8080/job/hive-precommit/job/PR-948/5/
> https://issues.apache.org/jira/browse/HIVE-22942
> https://github.com/apache/hive/pull/948/files
> 
> cheers,
> Zoltan
> 
> On 5/18/20 1:42 PM, Zoltan Haindrich wrote:
>> Hey!
>>
>> On 5/18/20 11:51 AM, Zoltan Chovan wrote:
>>> Thank you for all of your efforts, this looks really promising. With moving
>>> to github PRs, would that also mean that we move away from the reviewboard
>>> for code review?
>> I didn't thinked about that. I think using github's review interface will remain optional, because both review systems has there own strong points - I wouldn't force 
>> anyone to use one over the other. (For some patches reviewboard is much better; because it's able to track content moves a bit better than github. - meanwhile github has 
>> a small feature that enables to mark files as reviewed)
>> As a matter of fact we had sometimes patches on the jira's which never had neither an RB or a PR to review them - having a PR there at least will make it easier for 
>> reviewers to comment.
>>
>>> Also, what happens if a PR is updated? Will the tests run for both or just
>>> for the latest version?
>> It will trigger a new build - if there is already a build in progress that will prevent a new build from starting until it finishes...and there is also a 5 builds/day 
>> limit; which might induce some wait.
>>
>> cheers,
>> Zoltan
>>
>>>
>>> Regards,
>>> Zoltan
>>>
>>> On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich <ki...@rxd.hu> wrote:
>>>
>>>> Hello all!
>>>>
>>>> The proposed system have become more stable lately - and I think I've
>>>> solved a few sources of flakiness.
>>>> To be really usable I also wanted to add a way to dynamically
>>>> enable/disable a set of tests (for example the replication tests take ~7
>>>> hours to execute from the total of 24
>>>> hours - and they are also a bit unstable, so not running them when not
>>>> neccesary would be beneficial in multiple ways) - but to do this the best
>>>> would be to throw in
>>>> junit5; unfortunately the current ptest installation uses maven 3.0.5
>>>> which doesn't like these kind of things - so instead of hacking a fix for
>>>> that ....I've removed it
>>>> from the dev branch for now.
>>>>
>>>> I would like to propose to start an evaluation phase of the new test
>>>> procedures(INFRA-20269)
>>>> The process would look something like this:
>>>> * someone opens a PR - the tests will be run on the changes
>>>> * on every active branches the tests will run from time to time
>>>>     * this will produce a bunch of test runs on the master branch as well ;
>>>> which will show how well the tests behave on the master branch without any
>>>> patches
>>>> * runs on branches (PRs or active development branches(eg:master)) will be
>>>> rate limited to 5 builds/day
>>>> * at most ~4 builds at a time - to maximize resource usage
>>>> * turnaround time for a build is right now 2 hours - which I feel like a
>>>> balanced choice between speed/response time
>>>>
>>>> Possible future benefits:
>>>> * toggle features using github tags
>>>> * optional testgroups (metastore/replication) tests
>>>> * ability to run the metastore verification tests
>>>> * possibility to add smoke tests
>>>>
>>>> To enable this I will have to finish the HIVE-22942 ticket - beyond the
>>>> new Jenkinsfile which defines the full logic;
>>>> although I've sinked a lot of time into fixing all kind of flaky tests I
>>>> would would like to disable around ~25 tests.
>>>>
>>>> I also would like to propose a method to verify the stability of a single
>>>> test: run it a 100 times in series at the same place where the precommit
>>>> tests are running.
>>>> This will put the bar high enough that only totally stable tests could
>>>> satisfy it (a 99% stable test has 36% chance to pass this without being
>>>> caught :D)
>>>> After this will be in service it could be used to: validate that an
>>>> existing test is unstable (before disabling it) - and then used again to
>>>> prove that it got fixed during
>>>> re-enabling it.
>>>>
>>>> Please let me know what you think!
>>>>
>>>> cheers,
>>>> Zoltan
>>>>
>>>>
>>>>
>>>> On 4/29/20 4:28 PM, Zoltan Haindrich wrote:
>>>>> Hey All!
>>>>>
>>>>> I was planning to replace the ptest stuff with something less complex
>>>> for a while now - I see that we struggle a lot because of ptest is more
>>>> complicated than it should be...
>>>>> It would be much better if it would be constructed from well made
>>>> existing CI piece. - because of that I've started working on [1] a few
>>>> months ago.
>>>>>
>>>>> It has it's pros and cons...but it's not the same as the existing ptest
>>>> stuff.
>>>>> I've collected some infos about how it compares against the existing one
>>>> - but it became too long so I've moved it into a google docs document at
>>>> [3].
>>>>>
>>>>> It's not yet ready... I still have some remaining problems/concerns/etc
>>>>> * what do you think about changing to a github PR based workflow?
>>>>> * it will not support at all things like "isolation" - so we will have
>>>> to make our tests work with eachother without bending the rules...
>>>>> * I've tried to overcommit the cpu resources which creates a more noisy
>>>> environment for the actual tests - this squeezes out some new problems
>>>> which should be fixed before
>>>>> this could be enabled.
>>>>> * for every PR the first run is somewhat sub-optimal...there are some
>>>> reasons for this - the actually used resources are the same; but the
>>>> overall execution time is not
>>>>> optimal; I could accept this as a compromise because right now I wait
>>>>> 24 hours for a precommit run.
>>>>>
>>>>> It's deployed at [2] and anyone can start a testrun on it:
>>>>> * merge my HIVE-22942-ptest-alt branch from [4] into your branch
>>>>> * open a PR against my hive repo on github [5]
>>>>>
>>>>> cheers,
>>>>> Zoltan
>>>>>
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/HIVE-22942
>>>>> [2] http://34.66.156.144:8080/job/hive-precommit
>>>>> [3]
>>>> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing
>>>>> [4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt
>>>>> [5] https://github.com/kgyrtkirk/hive/
>>>>
>>>

Re: [DISCUSS] Replace ptest with hive-test-kube

Posted by Zoltan Haindrich <ki...@rxd.hu>.

Hello all!

The new stuff is ready to be switched on-to. It needs to be merged into master - and after that anyone who opens a PR will get a run by the new HiveQA infra.
I propose to run the 2 systems side-by-side for some time - the regular master builds will start; and we will see how frequently that is polluted by flaky tests.

Note that the current patch also disables around ~25 more tests to increase stability - to get a better overview about the disabled tests I think the "direction of the 
information flow" should be altered; what I mean by that is: instead of just throwing in a jira for "disable test x" and opening a new one like "fix test x"; only open the 
latter and place the jira reference into the ignore message; meanwhile also add a regular report about the actually disabled tests - so people who do know about the 
importance of a particular test can get involved.

Note: the builds.apache.org instance will be shutdown somewhere in the future as well...but I think the new one is a good-enough alternative to not have to migrate the 
Hive-precommit job over to https://ci-hadoop.apache.org/.

http://34.66.156.144:8080/job/hive-precommit/job/PR-948/5/
https://issues.apache.org/jira/browse/HIVE-22942
https://github.com/apache/hive/pull/948/files

cheers,
Zoltan

On 5/18/20 1:42 PM, Zoltan Haindrich wrote:
> Hey!
> 
> On 5/18/20 11:51 AM, Zoltan Chovan wrote:
>> Thank you for all of your efforts, this looks really promising. With moving
>> to github PRs, would that also mean that we move away from the reviewboard
>> for code review?
> I didn't thinked about that. I think using github's review interface will remain optional, because both review systems has there own strong points - I wouldn't force anyone 
> to use one over the other. (For some patches reviewboard is much better; because it's able to track content moves a bit better than github. - meanwhile github has a small 
> feature that enables to mark files as reviewed)
> As a matter of fact we had sometimes patches on the jira's which never had neither an RB or a PR to review them - having a PR there at least will make it easier for 
> reviewers to comment.
> 
>> Also, what happens if a PR is updated? Will the tests run for both or just
>> for the latest version?
> It will trigger a new build - if there is already a build in progress that will prevent a new build from starting until it finishes...and there is also a 5 builds/day 
> limit; which might induce some wait.
> 
> cheers,
> Zoltan
> 
>>
>> Regards,
>> Zoltan
>>
>> On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich <ki...@rxd.hu> wrote:
>>
>>> Hello all!
>>>
>>> The proposed system have become more stable lately - and I think I've
>>> solved a few sources of flakiness.
>>> To be really usable I also wanted to add a way to dynamically
>>> enable/disable a set of tests (for example the replication tests take ~7
>>> hours to execute from the total of 24
>>> hours - and they are also a bit unstable, so not running them when not
>>> neccesary would be beneficial in multiple ways) - but to do this the best
>>> would be to throw in
>>> junit5; unfortunately the current ptest installation uses maven 3.0.5
>>> which doesn't like these kind of things - so instead of hacking a fix for
>>> that ....I've removed it
>>> from the dev branch for now.
>>>
>>> I would like to propose to start an evaluation phase of the new test
>>> procedures(INFRA-20269)
>>> The process would look something like this:
>>> * someone opens a PR - the tests will be run on the changes
>>> * on every active branches the tests will run from time to time
>>>     * this will produce a bunch of test runs on the master branch as well ;
>>> which will show how well the tests behave on the master branch without any
>>> patches
>>> * runs on branches (PRs or active development branches(eg:master)) will be
>>> rate limited to 5 builds/day
>>> * at most ~4 builds at a time - to maximize resource usage
>>> * turnaround time for a build is right now 2 hours - which I feel like a
>>> balanced choice between speed/response time
>>>
>>> Possible future benefits:
>>> * toggle features using github tags
>>> * optional testgroups (metastore/replication) tests
>>> * ability to run the metastore verification tests
>>> * possibility to add smoke tests
>>>
>>> To enable this I will have to finish the HIVE-22942 ticket - beyond the
>>> new Jenkinsfile which defines the full logic;
>>> although I've sinked a lot of time into fixing all kind of flaky tests I
>>> would would like to disable around ~25 tests.
>>>
>>> I also would like to propose a method to verify the stability of a single
>>> test: run it a 100 times in series at the same place where the precommit
>>> tests are running.
>>> This will put the bar high enough that only totally stable tests could
>>> satisfy it (a 99% stable test has 36% chance to pass this without being
>>> caught :D)
>>> After this will be in service it could be used to: validate that an
>>> existing test is unstable (before disabling it) - and then used again to
>>> prove that it got fixed during
>>> re-enabling it.
>>>
>>> Please let me know what you think!
>>>
>>> cheers,
>>> Zoltan
>>>
>>>
>>>
>>> On 4/29/20 4:28 PM, Zoltan Haindrich wrote:
>>>> Hey All!
>>>>
>>>> I was planning to replace the ptest stuff with something less complex
>>> for a while now - I see that we struggle a lot because of ptest is more
>>> complicated than it should be...
>>>> It would be much better if it would be constructed from well made
>>> existing CI piece. - because of that I've started working on [1] a few
>>> months ago.
>>>>
>>>> It has it's pros and cons...but it's not the same as the existing ptest
>>> stuff.
>>>> I've collected some infos about how it compares against the existing one
>>> - but it became too long so I've moved it into a google docs document at
>>> [3].
>>>>
>>>> It's not yet ready... I still have some remaining problems/concerns/etc
>>>> * what do you think about changing to a github PR based workflow?
>>>> * it will not support at all things like "isolation" - so we will have
>>> to make our tests work with eachother without bending the rules...
>>>> * I've tried to overcommit the cpu resources which creates a more noisy
>>> environment for the actual tests - this squeezes out some new problems
>>> which should be fixed before
>>>> this could be enabled.
>>>> * for every PR the first run is somewhat sub-optimal...there are some
>>> reasons for this - the actually used resources are the same; but the
>>> overall execution time is not
>>>> optimal; I could accept this as a compromise because right now I wait
>>>> 24 hours for a precommit run.
>>>>
>>>> It's deployed at [2] and anyone can start a testrun on it:
>>>> * merge my HIVE-22942-ptest-alt branch from [4] into your branch
>>>> * open a PR against my hive repo on github [5]
>>>>
>>>> cheers,
>>>> Zoltan
>>>>
>>>>
>>>> [1] https://issues.apache.org/jira/browse/HIVE-22942
>>>> [2] http://34.66.156.144:8080/job/hive-precommit
>>>> [3]
>>> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing
>>>> [4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt
>>>> [5] https://github.com/kgyrtkirk/hive/
>>>
>>

Re: [DISCUSS] Replace ptest with hive-test-kube

Posted by Zoltan Haindrich <ki...@rxd.hu>.

Hey!

On 5/18/20 11:51 AM, Zoltan Chovan wrote:
> Thank you for all of your efforts, this looks really promising. With moving
> to github PRs, would that also mean that we move away from the reviewboard
> for code review?
I didn't thinked about that. I think using github's review interface will remain optional, because both review systems has there own strong points - I wouldn't force anyone 
to use one over the other. (For some patches reviewboard is much better; because it's able to track content moves a bit better than github. - meanwhile github has a small 
feature that enables to mark files as reviewed)
As a matter of fact we had sometimes patches on the jira's which never had neither an RB or a PR to review them - having a PR there at least will make it easier for 
reviewers to comment.

> Also, what happens if a PR is updated? Will the tests run for both or just
> for the latest version?
It will trigger a new build - if there is already a build in progress that will prevent a new build from starting until it finishes...and there is also a 5 builds/day 
limit; which might induce some wait.

cheers,
Zoltan

> 
> Regards,
> Zoltan
> 
> On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich <ki...@rxd.hu> wrote:
> 
>> Hello all!
>>
>> The proposed system have become more stable lately - and I think I've
>> solved a few sources of flakiness.
>> To be really usable I also wanted to add a way to dynamically
>> enable/disable a set of tests (for example the replication tests take ~7
>> hours to execute from the total of 24
>> hours - and they are also a bit unstable, so not running them when not
>> neccesary would be beneficial in multiple ways) - but to do this the best
>> would be to throw in
>> junit5; unfortunately the current ptest installation uses maven 3.0.5
>> which doesn't like these kind of things - so instead of hacking a fix for
>> that ....I've removed it
>> from the dev branch for now.
>>
>> I would like to propose to start an evaluation phase of the new test
>> procedures(INFRA-20269)
>> The process would look something like this:
>> * someone opens a PR - the tests will be run on the changes
>> * on every active branches the tests will run from time to time
>>     * this will produce a bunch of test runs on the master branch as well ;
>> which will show how well the tests behave on the master branch without any
>> patches
>> * runs on branches (PRs or active development branches(eg:master)) will be
>> rate limited to 5 builds/day
>> * at most ~4 builds at a time - to maximize resource usage
>> * turnaround time for a build is right now 2 hours - which I feel like a
>> balanced choice between speed/response time
>>
>> Possible future benefits:
>> * toggle features using github tags
>> * optional testgroups (metastore/replication) tests
>> * ability to run the metastore verification tests
>> * possibility to add smoke tests
>>
>> To enable this I will have to finish the HIVE-22942 ticket - beyond the
>> new Jenkinsfile which defines the full logic;
>> although I've sinked a lot of time into fixing all kind of flaky tests I
>> would would like to disable around ~25 tests.
>>
>> I also would like to propose a method to verify the stability of a single
>> test: run it a 100 times in series at the same place where the precommit
>> tests are running.
>> This will put the bar high enough that only totally stable tests could
>> satisfy it (a 99% stable test has 36% chance to pass this without being
>> caught :D)
>> After this will be in service it could be used to: validate that an
>> existing test is unstable (before disabling it) - and then used again to
>> prove that it got fixed during
>> re-enabling it.
>>
>> Please let me know what you think!
>>
>> cheers,
>> Zoltan
>>
>>
>>
>> On 4/29/20 4:28 PM, Zoltan Haindrich wrote:
>>> Hey All!
>>>
>>> I was planning to replace the ptest stuff with something less complex
>> for a while now - I see that we struggle a lot because of ptest is more
>> complicated than it should be...
>>> It would be much better if it would be constructed from well made
>> existing CI piece. - because of that I've started working on [1] a few
>> months ago.
>>>
>>> It has it's pros and cons...but it's not the same as the existing ptest
>> stuff.
>>> I've collected some infos about how it compares against the existing one
>> - but it became too long so I've moved it into a google docs document at
>> [3].
>>>
>>> It's not yet ready... I still have some remaining problems/concerns/etc
>>> * what do you think about changing to a github PR based workflow?
>>> * it will not support at all things like "isolation" - so we will have
>> to make our tests work with eachother without bending the rules...
>>> * I've tried to overcommit the cpu resources which creates a more noisy
>> environment for the actual tests - this squeezes out some new problems
>> which should be fixed before
>>> this could be enabled.
>>> * for every PR the first run is somewhat sub-optimal...there are some
>> reasons for this - the actually used resources are the same; but the
>> overall execution time is not
>>> optimal; I could accept this as a compromise because right now I wait
>>> 24 hours for a precommit run.
>>>
>>> It's deployed at [2] and anyone can start a testrun on it:
>>> * merge my HIVE-22942-ptest-alt branch from [4] into your branch
>>> * open a PR against my hive repo on github [5]
>>>
>>> cheers,
>>> Zoltan
>>>
>>>
>>> [1] https://issues.apache.org/jira/browse/HIVE-22942
>>> [2] http://34.66.156.144:8080/job/hive-precommit
>>> [3]
>> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing
>>> [4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt
>>> [5] https://github.com/kgyrtkirk/hive/
>>
>

Re: [DISCUSS] Replace ptest with hive-test-kube

Posted by Zoltan Chovan <zc...@cloudera.com.INVALID>.

Hi Zoltan!

Thank you for all of your efforts, this looks really promising. With moving
to github PRs, would that also mean that we move away from the reviewboard
for code review?
Also, what happens if a PR is updated? Will the tests run for both or just
for the latest version?

Regards,
Zoltan

On Sun, May 17, 2020 at 10:51 PM Zoltan Haindrich <ki...@rxd.hu> wrote:

> Hello all!
>
> The proposed system have become more stable lately - and I think I've
> solved a few sources of flakiness.
> To be really usable I also wanted to add a way to dynamically
> enable/disable a set of tests (for example the replication tests take ~7
> hours to execute from the total of 24
> hours - and they are also a bit unstable, so not running them when not
> neccesary would be beneficial in multiple ways) - but to do this the best
> would be to throw in
> junit5; unfortunately the current ptest installation uses maven 3.0.5
> which doesn't like these kind of things - so instead of hacking a fix for
> that ....I've removed it
> from the dev branch for now.
>
> I would like to propose to start an evaluation phase of the new test
> procedures(INFRA-20269)
> The process would look something like this:
> * someone opens a PR - the tests will be run on the changes
> * on every active branches the tests will run from time to time
>    * this will produce a bunch of test runs on the master branch as well ;
> which will show how well the tests behave on the master branch without any
> patches
> * runs on branches (PRs or active development branches(eg:master)) will be
> rate limited to 5 builds/day
> * at most ~4 builds at a time - to maximize resource usage
> * turnaround time for a build is right now 2 hours - which I feel like a
> balanced choice between speed/response time
>
> Possible future benefits:
> * toggle features using github tags
> * optional testgroups (metastore/replication) tests
> * ability to run the metastore verification tests
> * possibility to add smoke tests
>
> To enable this I will have to finish the HIVE-22942 ticket - beyond the
> new Jenkinsfile which defines the full logic;
> although I've sinked a lot of time into fixing all kind of flaky tests I
> would would like to disable around ~25 tests.
>
> I also would like to propose a method to verify the stability of a single
> test: run it a 100 times in series at the same place where the precommit
> tests are running.
> This will put the bar high enough that only totally stable tests could
> satisfy it (a 99% stable test has 36% chance to pass this without being
> caught :D)
> After this will be in service it could be used to: validate that an
> existing test is unstable (before disabling it) - and then used again to
> prove that it got fixed during
> re-enabling it.
>
> Please let me know what you think!
>
> cheers,
> Zoltan
>
>
>
> On 4/29/20 4:28 PM, Zoltan Haindrich wrote:
> > Hey All!
> >
> > I was planning to replace the ptest stuff with something less complex
> for a while now - I see that we struggle a lot because of ptest is more
> complicated than it should be...
> > It would be much better if it would be constructed from well made
> existing CI piece. - because of that I've started working on [1] a few
> months ago.
> >
> > It has it's pros and cons...but it's not the same as the existing ptest
> stuff.
> > I've collected some infos about how it compares against the existing one
> - but it became too long so I've moved it into a google docs document at
> [3].
> >
> > It's not yet ready... I still have some remaining problems/concerns/etc
> > * what do you think about changing to a github PR based workflow?
> > * it will not support at all things like "isolation" - so we will have
> to make our tests work with eachother without bending the rules...
> > * I've tried to overcommit the cpu resources which creates a more noisy
> environment for the actual tests - this squeezes out some new problems
> which should be fixed before
> > this could be enabled.
> > * for every PR the first run is somewhat sub-optimal...there are some
> reasons for this - the actually used resources are the same; but the
> overall execution time is not
> > optimal; I could accept this as a compromise because right now I wait
> >24 hours for a precommit run.
> >
> > It's deployed at [2] and anyone can start a testrun on it:
> > * merge my HIVE-22942-ptest-alt branch from [4] into your branch
> > * open a PR against my hive repo on github [5]
> >
> > cheers,
> > Zoltan
> >
> >
> > [1] https://issues.apache.org/jira/browse/HIVE-22942
> > [2] http://34.66.156.144:8080/job/hive-precommit
> > [3]
> https://docs.google.com/document/d/1dhL5B-eBvYNKEsNV3kE6RrkV5w-LtDgw5CtHV5pdoX4/edit?usp=sharing
> > [4] https://github.com/kgyrtkirk/hive/tree/HIVE-22942-ptest-alt
> > [5] https://github.com/kgyrtkirk/hive/
>