You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Sean Busbey <bu...@cloudera.com> on 2017/09/14 15:03:07 UTC

[DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Moving discussion here from HADOOP-14654.

Short synopsis:

* HADOOP-14654 updated commons-httplient to a new patch release in
hadoop-project
* Precommit checked the modules that changed (i.e. not many)
* nightly had Azure support break due to a change in behavior.

Is this just the cost of our approach to precommit vs post commit testing?

One approach: do a dependency:list of each module and for those that show a
change with the patch we run tests there.

This will cause a slew of tests to run when dependencies change. For the
change in HADOOP-14654 probably we'd just have to run at the top level.

Steve L and I had some more details about things we could do on the ticket
if folks are interested.


-- 
busbey

Re: [DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Posted by Brahma Reddy Battula <br...@apache.org>.

Thanks Sean busbey for starting the discussion.This was one of pain point
even I notified to Allen couple of times.IIUC his point is committers and
contributors should monitor qbt.

IMHO,instead of post-commit try to fix in pre-commit only?? May be we can
get dependcy module(dependency:list) and execute pre-commit.

Even auto revert also good option.

Following previous discussions which I came across ,just for reference.

Suggestions:
1) let qbt give alarm on broken jira.

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201708.mbox/%3c3B4C977E-5920-4F55-8EBE-CC7E10BCF2A4@effectivemachines.com%3e

2) Run on parent project,as of now it's run on the current module ( where
changes happened),this might not completely ignore?

3) Update dummy class,when contributors/committers knows that it can impact
other modules such that pre-commit can run.


--Brahma Reddy Battula

On Thu, 14 Sep 2017 at 11:31 PM, Sean Busbey <bu...@cloudera.com> wrote:

> > Committers MUST check the qbt output after a commit.  They MUST make sure
> their commit didn’t break something new.
>
> How do we make this easier / more likely to happen?
>
> For example, I don't see any notice on HADOOP-14654 that the qbt
> post-commit failed. Is this a timing thing? Did Steve L just notice the
> break before we could finish the 10 hours it takes to get qbt done?
>
> How solid would qbt have to be for us to do something drastic like
> auto-revert changes after a failure?
>
>
> On Thu, Sep 14, 2017 at 11:05 AM, Allen Wittenauer <
> aw@effectivemachines.com
> > wrote:
>
> >
> > > On Sep 14, 2017, at 8:03 AM, Sean Busbey <bu...@cloudera.com> wrote:
> > >
> > > * HADOOP-14654 updated commons-httplient to a new patch release in
> > > hadoop-project
> > > * Precommit checked the modules that changed (i.e. not many)
> > > * nightly had Azure support break due to a change in behavior.
> >
> >         OK, so it worked as coded/designed.
> >
> > > Is this just the cost of our approach to precommit vs post commit
> > testing?
> >
> >         Yes.  It’s a classic speed vs. size computing problem.
> >
> > test-patch: quick but only runs a subset of tests
> > qbt: comprehensive but takes a very long time
> >
> >         Committers MUST check the qbt output after a commit.  They MUST
> > make sure their commit didn’t break something new.
> >
> > > One approach: do a dependency:list of each module and for those that
> > show a
> > > change with the patch we run tests there.
> >
> >         As soon as you change something like junit, you’re running over
> > everything …
> >
> >         Plus, let’s get real: there is a large contingent of committers
> > that barely take the time to read or even comprehend the current Yetus
> > output.  Adding *more* output is the last thing we want to do.
> >
> > > This will cause a slew of tests to run when dependencies change. For
> the
> > > change in HADOOP-14654 probably we'd just have to run at the top level.
> >
> >         … e.g., exactly what qbt does for 10+ hours every night.
> >
> >         It’s important to also recognize that we need to be “good
> > citizens” in the ASF. If we can do dependency checking in one 10 hour
> > streak vs. several, that reduces the load on the ASF build
> infrastructure.
> >
> >
> >
>
>
> --
> busbey
>
-- 



--Brahma Reddy Battula

Re: [DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Posted by Chris Douglas <cd...@apache.org>.

On Thu, Sep 14, 2017 at 1:43 PM, Ray Chiang <rc...@apache.org> wrote:
> The other solution I've seen (from Oozie?) is to re-run just the subset of
> failing tests once more.  That should help cut down the failures except for
> the mostly flaky of flakies.

Many of our unit tests generate random cases and report the seed to
reproduce, and others are flaky because they collide with other tests'
artifacts. Success on reexec risks missing some important cases. I'd
rather err on the side of fixing/removing tests that are too
unreliable to serve their purpose.

I understand the counter-argument, but Hadoop has accumulated a ton of
tests without a recent round of pruning. We could stand to lose a few
cycles to highlight and trim the cases that cost us the most dev and
CI time. -C

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Posted by Ray Chiang <rc...@apache.org>.

On 9/14/17 1:36 PM, Chris Douglas wrote:

> This has gotten bad enough that people are dismissing legitimate test
> failures among the noise.
>
> On Thu, Sep 14, 2017 at 1:20 PM, Allen Wittenauer
> <aw...@effectivemachines.com> wrote:
>>          Someone should probably invest some time into integrating the HBase flaky test code a) into Yetus and then b) into Hadoop.
> What does the HBase flaky test code do? Another extension to
> test-patch could run all new/modified tests multiple times, and report
> to JIRA if any run fails.
>
> Test code is not as thoroughly reviewed, and we shouldn't expect that
> will change. We can at least identify the tests that are unreliable,
> assign responsibility for fixing them, and disable noisy tests so we
> can start trusting the CI output. I'd rather miss a regression by
> disabling a flaky test than lose confidence in the CI infrastructure.
>
> Would anyone object to disabling, even deleting failing tests in trunk
> until they're fixed? -C
>

The other solution I've seen (from Oozie?) is to re-run just the subset 
of failing tests once more.  That should help cut down the failures 
except for the mostly flaky of flakies.

-Ray


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Posted by Sean Busbey <bu...@cloudera.com>.

On Thu, Sep 14, 2017 at 4:23 PM, Andrew Wang <an...@cloudera.com>
wrote:

>
> >
> > I discussed this on yetus-dev a while back and Allen thought it'd be
> non-trivial:
>
> https://lists.apache.org/thread.html/552ad614d1b3d5226a656b60c01084
> 57bcaa1219fb9ad985f8750ba1@%3Cdev.yetus.apache.org%3E
>
> I unfortunately don't have the test-patch.sh expertise to dig into this.
>
>
>
Hurm. getting something generic certainly sounds like a lot of work, but
getting something that works specifically with Maven maybe not. lemme see
if I can describe what I think the pieces look  like in a jira.

Re: [DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Posted by Erik Krogen <ek...@linkedin.com>.

I am +1 for a way to specify additional modules that should be run. Always running all module dependencies would be prohibitively expensive (especially for hadoop-common patches) but developers should have a good understanding of which patches are more high-risk for downstream consumers and can label accordingly.

> Maybe we should have a week to try and collaborate on that, with a focus on 1+ specific build (branch 3 ?) for now, and get that stable and happy?

While we need to start somewhere and trunk or branch-3 seem like reasonable places to start, I would actually argue that stable tests for the older release lines are at least as, if not more, valuable. Maintenance releases are likely to be less rigorously tested than a major release (given the assumption that it is already pretty stable and should only have lower risk patches), and backports are generally less rigorously reviewed than the trunk patch, yet these are the releases which should have the highest stability guarantees. This implies to me that they are the branches which need the most stable unit testing.

On 9/15/17, 5:17 AM, "Steve Loughran" <st...@hortonworks.com> wrote:

    1. I think maybe we should have the special ability or process for patches which change dependencies. We know that they have the potential for damage way beyond their .patch size and I fear them
    
    2. like allen says, we can't afford to have a full test run on every patch. Because t hen the infra is overloaded and either you don't get a turnaround time on a test within the day of submission, or the queue builds up so big that it's only by sunday evening that the backlog is cleared
    
    3. And we don't test the object stores enough, as even if you can do it just with a set of credentials, we can't grant them to jenkins (security) and it still takes lots of time (though with HADOOP-14553 we will cut the windows time down)
    
    4. And like allen also says, tests are a bit unreliable on the test infra. Example: TestKDiag; one of mine. No idea why it fails, it does work locally. Generally though, I think a lot of them are race conditions where the jenkins machines execute things in a different order, or simply take longer than we expect
    
    How about we identify those tests which fail intermittently on Jenkins alone and somehow downgrade them/get them explicilty excluded. I know its cheating, and we should try to fix them first (after all, the way they fail may change, which would be a regression)
    
    LambdaTestUtils.eventually() is designed to support spinning until a test passes, and with the most recent fix (HADOOP-14851) it may actually do this. It can help with race conditions (& inconsistent obect stores) by wrapping up the entire retry-until-something works process. But it only works if the race condition is between the production code and the assertion; if it is in the procuction code across threads, that's a serious problem
    
    Anyway: tests fail, we should care. IF you want to learn how to care, try and do what Allen has been busy with: try and keep Jenkins happy.
    
    Maybe we should have a week to try and collaborate on that, with a focus on 1+ specific build (branch 3 ?) for now, and get that stable and happy?
    
    If we have to do it with a jenkins profile and skipping the unreliable tests, so be it
    
    
    > On 14 Sep 2017, at 22:44, Arun Suresh <ar...@gmail.com> wrote:
    > 
    > I actually like this idea:
    > 
    >> One approach: do a dependency:list of each module and for those that show
    > a
    > change with the patch we run tests there.
    > 
    > Can 'jdeps' be used to prune the list of sub modules on which we do
    > pre-commit ? Essentially, we figure out which classes actually use the
    > modified classes from the patch and then run the pre-commit on those
    > packages ?
    > 
    > Cheers
    > -Arun
    > 
    > On Thu, Sep 14, 2017 at 2:23 PM, Andrew Wang <an...@cloudera.com>
    > wrote:
    > 
    >> On Thu, Sep 14, 2017 at 1:59 PM, Sean Busbey <bu...@apache.org> wrote:
    >> 
    >>> 
    >>> 
    >>> On 2017-09-14 15:36, Chris Douglas <cd...@apache.org> wrote:
    >>>> This has gotten bad enough that people are dismissing legitimate test
    >>>> failures among the noise.
    >>>> 
    >>>> On Thu, Sep 14, 2017 at 1:20 PM, Allen Wittenauer
    >>>> <aw...@effectivemachines.com> wrote:
    >>>>>        Someone should probably invest some time into integrating the
    >>> HBase flaky test code a) into Yetus and then b) into Hadoop.
    >>>> 
    >>>> What does the HBase flaky test code do? Another extension to
    >>>> test-patch could run all new/modified tests multiple times, and report
    >>>> to JIRA if any run fails.
    >>>> 
    >>> 
    >>> The current HBase stuff segregates untrusted tests by looking through
    >>> nightly test runs to find things that fail intermittently. We then don't
    >>> include those tests in either nightly or precommit tests. We have a
    >>> different job that just runs the untrusted tests and if they start
    >> passing
    >>> removes them from the list.
    >>> 
    >>> There's also a project getting used by SOLR called "BeastIT" that goes
    >>> through running parallel copies of a given test a large number of times
    >> to
    >>> reveal flaky tests.
    >>> 
    >>> Getting either/both of those into Yetus and used here would be a huge
    >>> improvement.
    >>> 
    >>> I discussed this on yetus-dev a while back and Allen thought it'd be
    >> non-trivial:
    >> 
    >> https://lists.apache.org/thread.html/552ad614d1b3d5226a656b60c01084
    >> 57bcaa1219fb9ad985f8750ba1@%3Cdev.yetus.apache.org%3E
    >> 
    >> I unfortunately don't have the test-patch.sh expertise to dig into this.
    >> 
    >> ---------------------------------------------------------------------
    >>> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
    >>> For additional commands, e-mail: common-dev-help@hadoop.apache.org
    >>> 
    >>> 
    >> 
    
    
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
    For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Posted by Steve Loughran <st...@hortonworks.com>.

1. I think maybe we should have the special ability or process for patches which change dependencies. We know that they have the potential for damage way beyond their .patch size and I fear them

2. like allen says, we can't afford to have a full test run on every patch. Because t hen the infra is overloaded and either you don't get a turnaround time on a test within the day of submission, or the queue builds up so big that it's only by sunday evening that the backlog is cleared

3. And we don't test the object stores enough, as even if you can do it just with a set of credentials, we can't grant them to jenkins (security) and it still takes lots of time (though with HADOOP-14553 we will cut the windows time down)

4. And like allen also says, tests are a bit unreliable on the test infra. Example: TestKDiag; one of mine. No idea why it fails, it does work locally. Generally though, I think a lot of them are race conditions where the jenkins machines execute things in a different order, or simply take longer than we expect

How about we identify those tests which fail intermittently on Jenkins alone and somehow downgrade them/get them explicilty excluded. I know its cheating, and we should try to fix them first (after all, the way they fail may change, which would be a regression)

LambdaTestUtils.eventually() is designed to support spinning until a test passes, and with the most recent fix (HADOOP-14851) it may actually do this. It can help with race conditions (& inconsistent obect stores) by wrapping up the entire retry-until-something works process. But it only works if the race condition is between the production code and the assertion; if it is in the procuction code across threads, that's a serious problem

Anyway: tests fail, we should care. IF you want to learn how to care, try and do what Allen has been busy with: try and keep Jenkins happy.

Maybe we should have a week to try and collaborate on that, with a focus on 1+ specific build (branch 3 ?) for now, and get that stable and happy?

If we have to do it with a jenkins profile and skipping the unreliable tests, so be it

> On 14 Sep 2017, at 22:44, Arun Suresh <ar...@gmail.com> wrote:
> 
> I actually like this idea:
> 
>> One approach: do a dependency:list of each module and for those that show
> a
> change with the patch we run tests there.
> 
> Can 'jdeps' be used to prune the list of sub modules on which we do
> pre-commit ? Essentially, we figure out which classes actually use the
> modified classes from the patch and then run the pre-commit on those
> packages ?
> 
> Cheers
> -Arun
> 
> On Thu, Sep 14, 2017 at 2:23 PM, Andrew Wang <an...@cloudera.com>
> wrote:
> 
>> On Thu, Sep 14, 2017 at 1:59 PM, Sean Busbey <bu...@apache.org> wrote:
>> 
>>> 
>>> 
>>> On 2017-09-14 15:36, Chris Douglas <cd...@apache.org> wrote:
>>>> This has gotten bad enough that people are dismissing legitimate test
>>>> failures among the noise.
>>>> 
>>>> On Thu, Sep 14, 2017 at 1:20 PM, Allen Wittenauer
>>>> <aw...@effectivemachines.com> wrote:
>>>>>        Someone should probably invest some time into integrating the
>>> HBase flaky test code a) into Yetus and then b) into Hadoop.
>>>> 
>>>> What does the HBase flaky test code do? Another extension to
>>>> test-patch could run all new/modified tests multiple times, and report
>>>> to JIRA if any run fails.
>>>> 
>>> 
>>> The current HBase stuff segregates untrusted tests by looking through
>>> nightly test runs to find things that fail intermittently. We then don't
>>> include those tests in either nightly or precommit tests. We have a
>>> different job that just runs the untrusted tests and if they start
>> passing
>>> removes them from the list.
>>> 
>>> There's also a project getting used by SOLR called "BeastIT" that goes
>>> through running parallel copies of a given test a large number of times
>> to
>>> reveal flaky tests.
>>> 
>>> Getting either/both of those into Yetus and used here would be a huge
>>> improvement.
>>> 
>>> I discussed this on yetus-dev a while back and Allen thought it'd be
>> non-trivial:
>> 
>> https://lists.apache.org/thread.html/552ad614d1b3d5226a656b60c01084
>> 57bcaa1219fb9ad985f8750ba1@%3Cdev.yetus.apache.org%3E
>> 
>> I unfortunately don't have the test-patch.sh expertise to dig into this.
>> 
>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
>>> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>>> 
>>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Posted by Arun Suresh <ar...@gmail.com>.

I actually like this idea:

> One approach: do a dependency:list of each module and for those that show
a
change with the patch we run tests there.

Can 'jdeps' be used to prune the list of sub modules on which we do
pre-commit ? Essentially, we figure out which classes actually use the
modified classes from the patch and then run the pre-commit on those
packages ?

Cheers
-Arun

On Thu, Sep 14, 2017 at 2:23 PM, Andrew Wang <an...@cloudera.com>
wrote:

> On Thu, Sep 14, 2017 at 1:59 PM, Sean Busbey <bu...@apache.org> wrote:
>
> >
> >
> > On 2017-09-14 15:36, Chris Douglas <cd...@apache.org> wrote:
> > > This has gotten bad enough that people are dismissing legitimate test
> > > failures among the noise.
> > >
> > > On Thu, Sep 14, 2017 at 1:20 PM, Allen Wittenauer
> > > <aw...@effectivemachines.com> wrote:
> > > >         Someone should probably invest some time into integrating the
> > HBase flaky test code a) into Yetus and then b) into Hadoop.
> > >
> > > What does the HBase flaky test code do? Another extension to
> > > test-patch could run all new/modified tests multiple times, and report
> > > to JIRA if any run fails.
> > >
> >
> > The current HBase stuff segregates untrusted tests by looking through
> > nightly test runs to find things that fail intermittently. We then don't
> > include those tests in either nightly or precommit tests. We have a
> > different job that just runs the untrusted tests and if they start
> passing
> > removes them from the list.
> >
> > There's also a project getting used by SOLR called "BeastIT" that goes
> > through running parallel copies of a given test a large number of times
> to
> > reveal flaky tests.
> >
> > Getting either/both of those into Yetus and used here would be a huge
> > improvement.
> >
> > I discussed this on yetus-dev a while back and Allen thought it'd be
> non-trivial:
>
> https://lists.apache.org/thread.html/552ad614d1b3d5226a656b60c01084
> 57bcaa1219fb9ad985f8750ba1@%3Cdev.yetus.apache.org%3E
>
> I unfortunately don't have the test-patch.sh expertise to dig into this.
>
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> > For additional commands, e-mail: common-dev-help@hadoop.apache.org
> >
> >
>

Re: [DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Posted by Andrew Wang <an...@cloudera.com>.

On Thu, Sep 14, 2017 at 1:59 PM, Sean Busbey <bu...@apache.org> wrote:

>
>
> On 2017-09-14 15:36, Chris Douglas <cd...@apache.org> wrote:
> > This has gotten bad enough that people are dismissing legitimate test
> > failures among the noise.
> >
> > On Thu, Sep 14, 2017 at 1:20 PM, Allen Wittenauer
> > <aw...@effectivemachines.com> wrote:
> > >         Someone should probably invest some time into integrating the
> HBase flaky test code a) into Yetus and then b) into Hadoop.
> >
> > What does the HBase flaky test code do? Another extension to
> > test-patch could run all new/modified tests multiple times, and report
> > to JIRA if any run fails.
> >
>
> The current HBase stuff segregates untrusted tests by looking through
> nightly test runs to find things that fail intermittently. We then don't
> include those tests in either nightly or precommit tests. We have a
> different job that just runs the untrusted tests and if they start passing
> removes them from the list.
>
> There's also a project getting used by SOLR called "BeastIT" that goes
> through running parallel copies of a given test a large number of times to
> reveal flaky tests.
>
> Getting either/both of those into Yetus and used here would be a huge
> improvement.
>
> I discussed this on yetus-dev a while back and Allen thought it'd be
non-trivial:

https://lists.apache.org/thread.html/552ad614d1b3d5226a656b60c0108457bcaa1219fb9ad985f8750ba1@%3Cdev.yetus.apache.org%3E

I unfortunately don't have the test-patch.sh expertise to dig into this.

---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: common-dev-help@hadoop.apache.org
>
>

Re: [DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Posted by Sean Busbey <bu...@apache.org>.

On 2017-09-14 15:36, Chris Douglas <cd...@apache.org> wrote: 
> This has gotten bad enough that people are dismissing legitimate test
> failures among the noise.
> 
> On Thu, Sep 14, 2017 at 1:20 PM, Allen Wittenauer
> <aw...@effectivemachines.com> wrote:
> >         Someone should probably invest some time into integrating the HBase flaky test code a) into Yetus and then b) into Hadoop.
> 
> What does the HBase flaky test code do? Another extension to
> test-patch could run all new/modified tests multiple times, and report
> to JIRA if any run fails.
> 

The current HBase stuff segregates untrusted tests by looking through nightly test runs to find things that fail intermittently. We then don't include those tests in either nightly or precommit tests. We have a different job that just runs the untrusted tests and if they start passing removes them from the list.

There's also a project getting used by SOLR called "BeastIT" that goes through running parallel copies of a given test a large number of times to reveal flaky tests.

Getting either/both of those into Yetus and used here would be a huge improvement.

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Posted by Chris Douglas <cd...@apache.org>.

This has gotten bad enough that people are dismissing legitimate test
failures among the noise.

On Thu, Sep 14, 2017 at 1:20 PM, Allen Wittenauer
<aw...@effectivemachines.com> wrote:
>         Someone should probably invest some time into integrating the HBase flaky test code a) into Yetus and then b) into Hadoop.

What does the HBase flaky test code do? Another extension to
test-patch could run all new/modified tests multiple times, and report
to JIRA if any run fails.

Test code is not as thoroughly reviewed, and we shouldn't expect that
will change. We can at least identify the tests that are unreliable,
assign responsibility for fixing them, and disable noisy tests so we
can start trusting the CI output. I'd rather miss a regression by
disabling a flaky test than lose confidence in the CI infrastructure.

Would anyone object to disabling, even deleting failing tests in trunk
until they're fixed? -C

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Posted by Allen Wittenauer <aw...@effectivemachines.com>.

> On Sep 14, 2017, at 11:01 AM, Sean Busbey <bu...@cloudera.com> wrote:
> 
>> Committers MUST check the qbt output after a commit.  They MUST make sure
> their commit didn’t break something new.
> 
> How do we make this easier / more likely to happen?
> 
> For example, I don't see any notice on HADOOP-14654 that the qbt
> post-commit failed. Is this a timing thing? Did Steve L just notice the
> break before we could finish the 10 hours it takes to get qbt done?

	qbt doesn't update JIRA because...

> 
> How solid would qbt have to be for us to do something drastic like
> auto-revert changes after a failure?


	... I have never seen the unit tests for Hadoop pass completely.  So it would always fail every JIRA that it was testing. There's no point in enabling the JIRA issue update or anything like that until our unit tests actually get reliable.  But that also means we're reliant upon the community to self police.  That is also failing.

	Prior to Yetus getting involved, the only unit tests that would reliably pass was mapreduce.  The rest would almost always fail. It had been that way for years.

	Someone should probably invest some time into integrating the HBase flaky test code a) into Yetus and then b) into Hadoop.

	It's also worth pointing out that the YARN findbugs error has been there for about six months.  It would also fail the build.


---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org

Re: [DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Posted by Sean Busbey <bu...@cloudera.com>.

> Committers MUST check the qbt output after a commit.  They MUST make sure
their commit didn’t break something new.

How do we make this easier / more likely to happen?

For example, I don't see any notice on HADOOP-14654 that the qbt
post-commit failed. Is this a timing thing? Did Steve L just notice the
break before we could finish the 10 hours it takes to get qbt done?

How solid would qbt have to be for us to do something drastic like
auto-revert changes after a failure?


On Thu, Sep 14, 2017 at 11:05 AM, Allen Wittenauer <aw@effectivemachines.com
> wrote:

>
> > On Sep 14, 2017, at 8:03 AM, Sean Busbey <bu...@cloudera.com> wrote:
> >
> > * HADOOP-14654 updated commons-httplient to a new patch release in
> > hadoop-project
> > * Precommit checked the modules that changed (i.e. not many)
> > * nightly had Azure support break due to a change in behavior.
>
>         OK, so it worked as coded/designed.
>
> > Is this just the cost of our approach to precommit vs post commit
> testing?
>
>         Yes.  It’s a classic speed vs. size computing problem.
>
> test-patch: quick but only runs a subset of tests
> qbt: comprehensive but takes a very long time
>
>         Committers MUST check the qbt output after a commit.  They MUST
> make sure their commit didn’t break something new.
>
> > One approach: do a dependency:list of each module and for those that
> show a
> > change with the patch we run tests there.
>
>         As soon as you change something like junit, you’re running over
> everything …
>
>         Plus, let’s get real: there is a large contingent of committers
> that barely take the time to read or even comprehend the current Yetus
> output.  Adding *more* output is the last thing we want to do.
>
> > This will cause a slew of tests to run when dependencies change. For the
> > change in HADOOP-14654 probably we'd just have to run at the top level.
>
>         … e.g., exactly what qbt does for 10+ hours every night.
>
>         It’s important to also recognize that we need to be “good
> citizens” in the ASF. If we can do dependency checking in one 10 hour
> streak vs. several, that reduces the load on the ASF build infrastructure.
>
>
>


-- 
busbey

Re: [DISCUSS] Can we make our precommit test robust to dependency changes while staying usable?

Posted by Allen Wittenauer <aw...@effectivemachines.com>.

> On Sep 14, 2017, at 8:03 AM, Sean Busbey <bu...@cloudera.com> wrote:
> 
> * HADOOP-14654 updated commons-httplient to a new patch release in
> hadoop-project
> * Precommit checked the modules that changed (i.e. not many)
> * nightly had Azure support break due to a change in behavior.

	OK, so it worked as coded/designed.

> Is this just the cost of our approach to precommit vs post commit testing?

	Yes.  It’s a classic speed vs. size computing problem.

test-patch: quick but only runs a subset of tests
qbt: comprehensive but takes a very long time

	Committers MUST check the qbt output after a commit.  They MUST make sure their commit didn’t break something new.

> One approach: do a dependency:list of each module and for those that show a
> change with the patch we run tests there.

	As soon as you change something like junit, you’re running over everything … 

	Plus, let’s get real: there is a large contingent of committers that barely take the time to read or even comprehend the current Yetus output.  Adding *more* output is the last thing we want to do.

> This will cause a slew of tests to run when dependencies change. For the
> change in HADOOP-14654 probably we'd just have to run at the top level.

	… e.g., exactly what qbt does for 10+ hours every night.

	It’s important to also recognize that we need to be “good citizens” in the ASF. If we can do dependency checking in one 10 hour streak vs. several, that reduces the load on the ASF build infrastructure.



---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org