You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Ted Yu <yu...@gmail.com> on 2011/09/24 12:51:36 UTC

maintaining stable HBase build

Hi,
I want to bring the importance of maintaining stable HBase build to our
attention.
A stable HBase build is important, not just for the next release but also
for authors of the pending patches to verify the correctness of their work.

At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds were all
blue. Now they're all red.

I don't mind fixing Jenkins build. But if we collectively adopt some good
practice, it would be easier to achieve the goal of having stable builds.

For contributors, I understand that it takes so much time to run whole test
suite that he/she may not have the luxury of doing this - Apache Jenkins
wouldn't do it when you press Submit Patch button.
If this is the case (let's call it scenario A), please use Eclipse (or other
tool) to identify tests that exercise the classes/methods in your patch and
run them. Also clearly state what tests you ran in the JIRA.

If you have a Linux box where you can run whole test suite, it would be nice
to utilize such resource and run whole suite. Then please state this fact on
the JIRA as well.
Considering Todd's suggestion of holding off commit for 24 hours after code
review, 2 hour test run isn't that long.

Sometimes you may see the following (from 0.92 build 18):

Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:51:41.797s

You should examine the test summary above these lines and find out
which test(s) hung. For this case it was TestMasterFailover:

Running org.apache.hadoop.hbase.master.TestMasterFailover
Running org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec

I think a script should be developed that parses test output and
identify hanging test(s).

For scenario A, I hope committer would run test suite.
The net effect would be a statement on the JIRA, saying all tests passed.

Your comments/suggestions are welcome.

Re: maintaining stable HBase build

Posted by Ted Yu <yu...@gmail.com>.
Gary:
From your comment in the jira on Sept 23rd, it wasn't clear that you were running test suite. 
Since I have been involved in the review of 4014, I took the action of integration which was premature. 

I think in the future, we should use clear language, especially in the final stages of review. 
We should indicate whether the +1 comes with running test suite or not. 
In case of multiple committers on the same JIRA (4455 was reviewed by 5 committers), the person planning on committing should indicate the intention clearly.

Thanks Gary. 

On Sep 24, 2011, at 12:03 PM, Gary Helmling <gh...@gmail.com> wrote:

>> Since both Gary and Eugene have been working on HBASE-4014 for quite some
>> time, I didn't initially question the test cases.
>> After integrating the patch for TRUNK, I discovered that
>> TestRegionServerCoprocessorExceptionWithAbort failed consistently on Mac
>> and
>> Linux. So I backed it out.
>> I first thought of disabling this particular test but later abandoned that
>> idea - if a core test fails, this means the feature may have issue.
>> I notified Eugene immediately and he will take a look today.
>> 
>> 
> Ted, I did say that I would commit this change.  But I was still in the
> process of verifying the tests, so I was a bit surprised to see that it had
> been committed.  Running the tests had already uncovered one issue
> (HBASE-4472).  I understand that maybe I'm taking longer than some might
> like -- tests do take a long time to run and I was traveling yesterday.  I
> do appreciate your follow up, but don't see the need for this patch to have
> been rushed.
> 
> And seconding Andy's thought, don't take my word for it working! :)  I was
> contingent on tests passing, which I still had yet to confirm.  Sorry if I
> wasn't clear on that.
> 
> I'm happy to see the effort going in to improving our test situation, both
> speeding up our current tests and separating out test groups.  Props to all
> who have been contributing to that.  Anything we can do to streamline the
> patch verification process will make it easier for all to follow it.

Re: maintaining stable HBase build

Posted by Gary Helmling <gh...@gmail.com>.
> Since both Gary and Eugene have been working on HBASE-4014 for quite some
> time, I didn't initially question the test cases.
> After integrating the patch for TRUNK, I discovered that
> TestRegionServerCoprocessorExceptionWithAbort failed consistently on Mac
> and
> Linux. So I backed it out.
> I first thought of disabling this particular test but later abandoned that
> idea - if a core test fails, this means the feature may have issue.
> I notified Eugene immediately and he will take a look today.
>
>
Ted, I did say that I would commit this change.  But I was still in the
process of verifying the tests, so I was a bit surprised to see that it had
been committed.  Running the tests had already uncovered one issue
(HBASE-4472).  I understand that maybe I'm taking longer than some might
like -- tests do take a long time to run and I was traveling yesterday.  I
do appreciate your follow up, but don't see the need for this patch to have
been rushed.

And seconding Andy's thought, don't take my word for it working! :)  I was
contingent on tests passing, which I still had yet to confirm.  Sorry if I
wasn't clear on that.

I'm happy to see the effort going in to improving our test situation, both
speeding up our current tests and separating out test groups.  Props to all
who have been contributing to that.  Anything we can do to streamline the
patch verification process will make it easier for all to follow it.

Re: maintaining stable HBase build

Posted by Andrew Purtell <ap...@apache.org>.
Thanks Ted.


> Since both Gary and Eugene have been working on HBASE-4014 for quite some
> time, I didn't initially question the test cases.

This is understandable but I think we should just not have this kind of trust. :-) I've been burned by committing something that I thought was fine due to the submitter before too. You can never know.


Best regards,


       - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


----- Original Message -----
> From: Ted Yu <yu...@gmail.com>
> To: dev@hbase.apache.org; Andrew Purtell <ap...@apache.org>
> Cc: 
> Sent: Saturday, September 24, 2011 9:31 AM
> Subject: Re: maintaining stable HBase build
> 
>>>  It should never have gone in if only to be reverted 35 minutes later.
> (What happened?)
> 
> Since both Gary and Eugene have been working on HBASE-4014 for quite some
> time, I didn't initially question the test cases.
> After integrating the patch for TRUNK, I discovered that
> TestRegionServerCoprocessorExceptionWithAbort failed consistently on Mac and
> Linux. So I backed it out.
> I first thought of disabling this particular test but later abandoned that
> idea - if a core test fails, this means the feature may have issue.
> I notified Eugene immediately and he will take a look today.
> 
>>>  Scrolling down the commit history for trunk further, is a series of
> half-commits, addendums, reverts, reverts of reverts, etc.
> 
> If you were talking about
> HBASE-4132<https://issues.apache.org/jira/browse/HBASE-4132>,
> I initially tried to salvage the JIRA by adjusting the triggering assertion.
> However, that turned out to be not so trivial. So I reopened the JIRA.
> 
> Just FYI
> 
> On Sat, Sep 24, 2011 at 9:13 AM, Andrew Purtell <ap...@apache.org> 
> wrote:
> 
>>  +1
>> 
>>  This:
>>  >>>
>>  > For contributors, I understand that it takes so much time to run whole
>>  test
>>  > suite that he/she may not have the luxury of doing this - Apache 
> Jenkins
>>  > wouldn't do it when you press Submit Patch button.
>>  > If this is the case (let's call it scenario A), please use Eclipse 
> (or
>>  other
>>  > tool) to identify tests that exercise the classes/methods in your 
> patch
>>  and
>>  > run them. Also clearly state what tests you ran in the JIRA.
>>  <<<
>> 
>>  and
>> 
>>  >>>
>>  > For scenario A, I hope committer would run test suite.
>> 
>>  <<<
>> 
>> 
>>  should be added to the How To Contribute page, IMHO.
>> 
>> 
>>  I see that HBASE-4014 went in -- which is important, so let's fix it 
> and
>>  try again -- and then went right out again, reverted after 35 minutes. It
>>  should never have gone in if only to be reverted 35 minutes later. (What
>>  happened?) Scrolling down the commit history for trunk further, is a series
>>  of half-commits, addendums, reverts, reverts of reverts, etc.
>> 
>>  It has recently become difficult to cherry pick any single commit from
>>  trunk andget all of the necessary parts of a change together or have any
>>  assurance the change is not toxic. This is not just a maintainer issue --
>>  diffing the full extent of a change to understand it fully mixes in
>>  unrelated changes between the initial commit and addendums, unless one
>>  resorts to octopus like contortions with git.
>> 
>> 
>>  So what is the solution? Submitted for your consideration:
>> 
>> 
>>  Committers should apply a candidate change and run the full test suite
>>  before committing the change to trunk or any branch. If applying a change 
> to
>>  a branch, a full test suite run of the branch code should complete
>>  successfully before commit there as well.
>> 
>>  No patch is so pressing that it cannot wait for tests to finish before
>>  commit, IMO.
>> 
>>  If a test fails, the patch does not go in.
>> 
>>  If a test fails repeatedly for unrelated reasons, the test comes out and a
>>  jira to fix it gets opened.
>> 
>>  Finally, I can see where people are trying to fix the build, so please
>>  exclude
>>  those commits from my complaint here, that is not part of the problem.
>>  Best regards,
>> 
>> 
>>         - Andy
>> 
>>  Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>  (via Tom White)
>> 
>> 
>>  ----- Original Message -----
>>  > From: Ted Yu <yu...@gmail.com>
>>  > To: dev@hbase.apache.org
>>  > Cc:
>>  > Sent: Saturday, September 24, 2011 3:51 AM
>>  > Subject: maintaining stable HBase build
>>  >
>>  > Hi,
>>  > I want to bring the importance of maintaining stable HBase build to 
> our
>>  > attention.
>>  > A stable HBase build is important, not just for the next release but 
> also
>>  > for authors of the pending patches to verify the correctness of their
>>  work.
>>  >
>>  > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds were 
> all
>>  > blue. Now they're all red.
>>  >
>>  > I don't mind fixing Jenkins build. But if we collectively adopt 
> some good
>>  > practice, it would be easier to achieve the goal of having stable 
> builds.
>>  >
>>  > For contributors, I understand that it takes so much time to run whole
>>  test
>>  > suite that he/she may not have the luxury of doing this - Apache 
> Jenkins
>>  > wouldn't do it when you press Submit Patch button.
>>  > If this is the case (let's call it scenario A), please use Eclipse 
> (or
>>  other
>>  > tool) to identify tests that exercise the classes/methods in your 
> patch
>>  and
>>  > run them. Also clearly state what tests you ran in the JIRA.
>>  >
>>  > If you have a Linux box where you can run whole test suite, it would 
> be
>>  nice
>>  > to utilize such resource and run whole suite. Then please state this 
> fact
>>  on
>>  > the JIRA as well.
>>  > Considering Todd's suggestion of holding off commit for 24 hours 
> after
>>  code
>>  > review, 2 hour test run isn't that long.
>>  >
>>  > Sometimes you may see the following (from 0.92 build 18):
>>  >
>>  > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
>>  >
>>  > [INFO]
>>  ------------------------------------------------------------------------
>>  > [INFO] BUILD FAILURE
>>  > [INFO]
>>  ------------------------------------------------------------------------
>>  > [INFO] Total time: 1:51:41.797s
>>  >
>>  > You should examine the test summary above these lines and find out
>>  > which test(s) hung. For this case it was TestMasterFailover:
>>  >
>>  > Running org.apache.hadoop.hbase.master.TestMasterFailover
>>  > Running
>>  org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable
>>  > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265
>>  sec
>>  >
>>  > I think a script should be developed that parses test output and
>>  > identify hanging test(s).
>>  >
>>  > For scenario A, I hope committer would run test suite.
>>  > The net effect would be a statement on the JIRA, saying all tests 
> passed.
>>  >
>>  > Your comments/suggestions are welcome.
>>  >
>> 
>

Re: maintaining stable HBase build

Posted by Ted Yu <yu...@gmail.com>.
>> It should never have gone in if only to be reverted 35 minutes later.
(What happened?)

Since both Gary and Eugene have been working on HBASE-4014 for quite some
time, I didn't initially question the test cases.
After integrating the patch for TRUNK, I discovered that
TestRegionServerCoprocessorExceptionWithAbort failed consistently on Mac and
Linux. So I backed it out.
I first thought of disabling this particular test but later abandoned that
idea - if a core test fails, this means the feature may have issue.
I notified Eugene immediately and he will take a look today.

>> Scrolling down the commit history for trunk further, is a series of
half-commits, addendums, reverts, reverts of reverts, etc.

If you were talking about
HBASE-4132<https://issues.apache.org/jira/browse/HBASE-4132>,
I initially tried to salvage the JIRA by adjusting the triggering assertion.
However, that turned out to be not so trivial. So I reopened the JIRA.

Just FYI

On Sat, Sep 24, 2011 at 9:13 AM, Andrew Purtell <ap...@apache.org> wrote:

> +1
>
> This:
> >>>
> > For contributors, I understand that it takes so much time to run whole
> test
> > suite that he/she may not have the luxury of doing this - Apache Jenkins
> > wouldn't do it when you press Submit Patch button.
> > If this is the case (let's call it scenario A), please use Eclipse (or
> other
> > tool) to identify tests that exercise the classes/methods in your patch
> and
> > run them. Also clearly state what tests you ran in the JIRA.
> <<<
>
> and
>
> >>>
> > For scenario A, I hope committer would run test suite.
>
> <<<
>
>
> should be added to the How To Contribute page, IMHO.
>
>
> I see that HBASE-4014 went in -- which is important, so let's fix it and
> try again -- and then went right out again, reverted after 35 minutes. It
> should never have gone in if only to be reverted 35 minutes later. (What
> happened?) Scrolling down the commit history for trunk further, is a series
> of half-commits, addendums, reverts, reverts of reverts, etc.
>
> It has recently become difficult to cherry pick any single commit from
> trunk andget all of the necessary parts of a change together or have any
> assurance the change is not toxic. This is not just a maintainer issue --
> diffing the full extent of a change to understand it fully mixes in
> unrelated changes between the initial commit and addendums, unless one
> resorts to octopus like contortions with git.
>
>
> So what is the solution? Submitted for your consideration:
>
>
> Committers should apply a candidate change and run the full test suite
> before committing the change to trunk or any branch. If applying a change to
> a branch, a full test suite run of the branch code should complete
> successfully before commit there as well.
>
> No patch is so pressing that it cannot wait for tests to finish before
> commit, IMO.
>
> If a test fails, the patch does not go in.
>
> If a test fails repeatedly for unrelated reasons, the test comes out and a
> jira to fix it gets opened.
>
> Finally, I can see where people are trying to fix the build, so please
> exclude
> those commits from my complaint here, that is not part of the problem.
> Best regards,
>
>
>        - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>
> ----- Original Message -----
> > From: Ted Yu <yu...@gmail.com>
> > To: dev@hbase.apache.org
> > Cc:
> > Sent: Saturday, September 24, 2011 3:51 AM
> > Subject: maintaining stable HBase build
> >
> > Hi,
> > I want to bring the importance of maintaining stable HBase build to our
> > attention.
> > A stable HBase build is important, not just for the next release but also
> > for authors of the pending patches to verify the correctness of their
> work.
> >
> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds were all
> > blue. Now they're all red.
> >
> > I don't mind fixing Jenkins build. But if we collectively adopt some good
> > practice, it would be easier to achieve the goal of having stable builds.
> >
> > For contributors, I understand that it takes so much time to run whole
> test
> > suite that he/she may not have the luxury of doing this - Apache Jenkins
> > wouldn't do it when you press Submit Patch button.
> > If this is the case (let's call it scenario A), please use Eclipse (or
> other
> > tool) to identify tests that exercise the classes/methods in your patch
> and
> > run them. Also clearly state what tests you ran in the JIRA.
> >
> > If you have a Linux box where you can run whole test suite, it would be
> nice
> > to utilize such resource and run whole suite. Then please state this fact
> on
> > the JIRA as well.
> > Considering Todd's suggestion of holding off commit for 24 hours after
> code
> > review, 2 hour test run isn't that long.
> >
> > Sometimes you may see the following (from 0.92 build 18):
> >
> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
> >
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] BUILD FAILURE
> > [INFO]
> ------------------------------------------------------------------------
> > [INFO] Total time: 1:51:41.797s
> >
> > You should examine the test summary above these lines and find out
> > which test(s) hung. For this case it was TestMasterFailover:
> >
> > Running org.apache.hadoop.hbase.master.TestMasterFailover
> > Running
> org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable
> > Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265
> sec
> >
> > I think a script should be developed that parses test output and
> > identify hanging test(s).
> >
> > For scenario A, I hope committer would run test suite.
> > The net effect would be a statement on the JIRA, saying all tests passed.
> >
> > Your comments/suggestions are welcome.
> >
>

Re: maintaining stable HBase build

Posted by Andrew Purtell <ap...@apache.org>.
+1

This:
>>>
> For contributors, I understand that it takes so much time to run whole test
> suite that he/she may not have the luxury of doing this - Apache Jenkins
> wouldn't do it when you press Submit Patch button.
> If this is the case (let's call it scenario A), please use Eclipse (or other
> tool) to identify tests that exercise the classes/methods in your patch and
> run them. Also clearly state what tests you ran in the JIRA.
<<< 

and 

>>>
> For scenario A, I hope committer would run test suite.

<<<


should be added to the How To Contribute page, IMHO.


I see that HBASE-4014 went in -- which is important, so let's fix it and try again -- and then went right out again, reverted after 35 minutes. It should never have gone in if only to be reverted 35 minutes later. (What happened?) Scrolling down the commit history for trunk further, is a series of half-commits, addendums, reverts, reverts of reverts, etc.

It has recently become difficult to cherry pick any single commit from trunk andget all of the necessary parts of a change together or have any assurance the change is not toxic. This is not just a maintainer issue -- diffing the full extent of a change to understand it fully mixes in unrelated changes between the initial commit and addendums, unless one resorts to octopus like contortions with git.


So what is the solution? Submitted for your consideration:


Committers should apply a candidate change and run the full test suite before committing the change to trunk or any branch. If applying a change to a branch, a full test suite run of the branch code should complete successfully before commit there as well.

No patch is so pressing that it cannot wait for tests to finish before commit, IMO.

If a test fails, the patch does not go in.

If a test fails repeatedly for unrelated reasons, the test comes out and a jira to fix it gets opened.

Finally, I can see where people are trying to fix the build, so please exclude 
those commits from my complaint here, that is not part of the problem.
Best regards,


       - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


----- Original Message -----
> From: Ted Yu <yu...@gmail.com>
> To: dev@hbase.apache.org
> Cc: 
> Sent: Saturday, September 24, 2011 3:51 AM
> Subject: maintaining stable HBase build
> 
> Hi,
> I want to bring the importance of maintaining stable HBase build to our
> attention.
> A stable HBase build is important, not just for the next release but also
> for authors of the pending patches to verify the correctness of their work.
> 
> At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds were all
> blue. Now they're all red.
> 
> I don't mind fixing Jenkins build. But if we collectively adopt some good
> practice, it would be easier to achieve the goal of having stable builds.
> 
> For contributors, I understand that it takes so much time to run whole test
> suite that he/she may not have the luxury of doing this - Apache Jenkins
> wouldn't do it when you press Submit Patch button.
> If this is the case (let's call it scenario A), please use Eclipse (or other
> tool) to identify tests that exercise the classes/methods in your patch and
> run them. Also clearly state what tests you ran in the JIRA.
> 
> If you have a Linux box where you can run whole test suite, it would be nice
> to utilize such resource and run whole suite. Then please state this fact on
> the JIRA as well.
> Considering Todd's suggestion of holding off commit for 24 hours after code
> review, 2 hour test run isn't that long.
> 
> Sometimes you may see the following (from 0.92 build 18):
> 
> Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
> 
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD FAILURE
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 1:51:41.797s
> 
> You should examine the test summary above these lines and find out
> which test(s) hung. For this case it was TestMasterFailover:
> 
> Running org.apache.hadoop.hbase.master.TestMasterFailover
> Running org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
> 
> I think a script should be developed that parses test output and
> identify hanging test(s).
> 
> For scenario A, I hope committer would run test suite.
> The net effect would be a statement on the JIRA, saying all tests passed.
> 
> Your comments/suggestions are welcome.
>

Re: maintaining stable HBase build

Posted by Ted Yu <yu...@gmail.com>.
>> we can kill the java processes that are hanging if any testcases hangs.
I think it is very important to find out why certain tests hang. Obtaining
jstack is the first step in terms of investigation.

Regards

On Mon, Sep 26, 2011 at 11:31 AM, Ramakrishna S Vasudevan 00902313 <
ramakrishnas@huawei.com> wrote:

> Hi
>
> Just wanted to share one thing that i learnt today in maven for running
> testcases.
>
> May be many will be knowing.
>
> We usually face problems like when we run testcases as a bunch few gets
> failed due to system problems or improper clean up of previous testcases.
>
> As Jon suggested we can seperate out flaky test cases from the correct
> ones.
>
> In maven we have a facility called profiles.
> We can add the testcases that we have seperated out seperately(may be in 2
> to 3 batches) and add it to seperate profiles.
>
> We can invoke these profiles like mvn test -P "profileid".
>
> We can right a script that executes every profile and inbetween executing
> every profile we can kill the java processes that are hanging if any
> testcases hangs.
> Just a suggestion. If you feel it suits you for some needs in any of your
> project work you can use it.
>
> Regards
> Ram
>
>
>
> ----- Original Message -----
> From: Jonathan Hsieh <jo...@cloudera.com>
> Date: Monday, September 26, 2011 11:15 pm
> Subject: Re: maintaining stable HBase build
> To: dev@hbase.apache.org, lars hofhansl <lh...@yahoo.com>
>
> > I've been hunting some flaky tests down as well -- a few weeks back
> > I was
> > testing some changes along the line of HBASE-4326.  (maybe some of
> > these are
> > fixed?)
> >
> > First, two test seemed to flake fairly frequently and were likely
> > problemsinternal to the tests (TestReplication, TestMasterFailover).
> >
> > There is a second set of tests that after applying a draft of HBASE-
> > 4326,seems to moves to a different set of tests.  I'm pretty
> > convinced there are
> > some cross test problems with these. This was on an 0.90.4 based
> > branch, and
> > by now several more changes have gone in.  I'm getting back to
> > HBASE-4326
> > and will try to get more stats on this.
> >
> > Alternately, I exclude tests that I identify as flaky and exclude
> > them from
> > the test run and have a separate test run that only runs the flaky
> > tests. The hooks for the excludes build is in the hbase pom but
> > only works
> > with maven surefire 2.6 or 2.10 when it comes out.  (there is a bug in
> > surefire).  See this jira for more details.
> > http://jira.codehaus.org/browse/SUREFIRE-766
> >
> > Jon.
> >
> > On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl
> > <lh...@yahoo.com> wrote:
> >
> > > At Salesforce we call these "flappers" and they are considered
> > almost worse
> > > than failing tests,
> > > as they add noise to a test run without adding confidence.
> > > At test that fails once in - say - 10 runs is worthless.
> > >
> > >
> > >
> > > ________________________________
> > > From: Ted Yu <yu...@gmail.com>
> > > To: dev@hbase.apache.org
> > > Sent: Sunday, September 25, 2011 1:41 PM
> > > Subject: Re: maintaining stable HBase build
> > >
> > > As of 1:38 PST Sunday, the three builds all passed.
> > >
> > > I think we have some tests that exhibit in-deterministic behavior.
> > >
> > > I suggest committers interleave patch submissions by 2 hour span
> > so that we
> > > can more easily identify patch(es) that break the build.
> > >
> > > Thanks
> > >
> > > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > I wrote a short blog:
> > > > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-
> > submission.html> >
> > > > It is geared towards contributors.
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan
> > 00902313 <
> > > > ramakrishnas@huawei.com> wrote:
> > > >
> > > >> Hi
> > > >>
> > > >> Ted, I agree with you.  Pasting the testcase results in JIRA
> > is also
> > > fine,
> > > >> mainly when there are some testcase failures when we run
> > locally but if
> > > we
> > > >> feel it is not due to the fix we have added we can mention
> > that also.  I
> > > >> think rather than in a windows machine its better to run in
> > linux box.
> > > >>
> > > >> +1 for your suggestion Ted.
> > > >>
> > > >> Can we add the feature like in HDFS when we submit patch
> > automatically> the
> > > >> Jenkin's run the testcases?
> > > >>
> > > >> Atleast till this is done I go with your suggestion.
> > > >>
> > > >> Regards
> > > >> Ram
> > > >>
> > > >> ----- Original Message -----
> > > >> From: Ted Yu <yu...@gmail.com>
> > > >> Date: Saturday, September 24, 2011 4:22 pm
> > > >> Subject: maintaining stable HBase build
> > > >> To: dev@hbase.apache.org
> > > >>
> > > >> > Hi,
> > > >> > I want to bring the importance of maintaining stable HBase
> > build to
> > > >> > ourattention.
> > > >> > A stable HBase build is important, not just for the next
> > release> >> > but also
> > > >> > for authors of the pending patches to verify the correctness of
> > > >> > their work.
> > > >> >
> > > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK
> > builds> >> > were all
> > > >> > blue. Now they're all red.
> > > >> >
> > > >> > I don't mind fixing Jenkins build. But if we collectively adopt
> > > >> > some good
> > > >> > practice, it would be easier to achieve the goal of having
> > stable> >> > builds.
> > > >> > For contributors, I understand that it takes so much time to
> > run> >> > whole test
> > > >> > suite that he/she may not have the luxury of doing this -
> > Apache> >> > Jenkinswouldn't do it when you press Submit Patch button.
> > > >> > If this is the case (let's call it scenario A), please use
> > Eclipse> >> > (or other
> > > >> > tool) to identify tests that exercise the classes/methods in
> > your> >> > patch and
> > > >> > run them. Also clearly state what tests you ran in the JIRA.
> > > >> >
> > > >> > If you have a Linux box where you can run whole test suite, it
> > > >> > would be nice
> > > >> > to utilize such resource and run whole suite. Then please state
> > > >> > this fact on
> > > >> > the JIRA as well.
> > > >> > Considering Todd's suggestion of holding off commit for 24
> > hours> >> > after code
> > > >> > review, 2 hour test run isn't that long.
> > > >> >
> > > >> > Sometimes you may see the following (from 0.92 build 18):
> > > >> >
> > > >> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
> > > >> >
> > > >> > [INFO] ------------------------------------------------------
> > -------
> > > >> > -----------
> > > >> > [INFO] BUILD FAILURE
> > > >> > [INFO] ------------------------------------------------------
> > -------
> > > >> > -----------
> > > >> > [INFO] Total time: 1:51:41.797s
> > > >> >
> > > >> > You should examine the test summary above these lines and
> > find out
> > > >> > which test(s) hung. For this case it was TestMasterFailover:
> > > >> >
> > > >> > Running org.apache.hadoop.hbase.master.TestMasterFailover
> > > >> > Running
> > > >> >
> > >
> > org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests>
> >> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
> > > >> >
> > > >> > I think a script should be developed that parses test output
> > and> >> > identify hanging test(s).
> > > >> >
> > > >> > For scenario A, I hope committer would run test suite.
> > > >> > The net effect would be a statement on the JIRA, saying all
> > tests> >> > passed.
> > > >> > Your comments/suggestions are welcome.
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > // Jonathan Hsieh (shay)
> > // Software Engineer, Cloudera
> > // jon@cloudera.com
> >
>

Re: maintaining stable HBase build

Posted by Ramakrishna S Vasudevan 00902313 <ra...@huawei.com>.
Hi

Just wanted to share one thing that i learnt today in maven for running testcases.

May be many will be knowing.

We usually face problems like when we run testcases as a bunch few gets failed due to system problems or improper clean up of previous testcases.

As Jon suggested we can seperate out flaky test cases from the correct ones.

In maven we have a facility called profiles.
We can add the testcases that we have seperated out seperately(may be in 2 to 3 batches) and add it to seperate profiles.

We can invoke these profiles like mvn test -P "profileid".

We can right a script that executes every profile and inbetween executing every profile we can kill the java processes that are hanging if any testcases hangs.
Just a suggestion. If you feel it suits you for some needs in any of your project work you can use it.

Regards
Ram



----- Original Message -----
From: Jonathan Hsieh <jo...@cloudera.com>
Date: Monday, September 26, 2011 11:15 pm
Subject: Re: maintaining stable HBase build
To: dev@hbase.apache.org, lars hofhansl <lh...@yahoo.com>

> I've been hunting some flaky tests down as well -- a few weeks back 
> I was
> testing some changes along the line of HBASE-4326.  (maybe some of 
> these are
> fixed?)
> 
> First, two test seemed to flake fairly frequently and were likely 
> problemsinternal to the tests (TestReplication, TestMasterFailover).
> 
> There is a second set of tests that after applying a draft of HBASE-
> 4326,seems to moves to a different set of tests.  I'm pretty 
> convinced there are
> some cross test problems with these. This was on an 0.90.4 based 
> branch, and
> by now several more changes have gone in.  I'm getting back to 
> HBASE-4326
> and will try to get more stats on this.
> 
> Alternately, I exclude tests that I identify as flaky and exclude 
> them from
> the test run and have a separate test run that only runs the flaky
> tests. The hooks for the excludes build is in the hbase pom but 
> only works
> with maven surefire 2.6 or 2.10 when it comes out.  (there is a bug in
> surefire).  See this jira for more details.
> http://jira.codehaus.org/browse/SUREFIRE-766
> 
> Jon.
> 
> On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl 
> <lh...@yahoo.com> wrote:
> 
> > At Salesforce we call these "flappers" and they are considered 
> almost worse
> > than failing tests,
> > as they add noise to a test run without adding confidence.
> > At test that fails once in - say - 10 runs is worthless.
> >
> >
> >
> > ________________________________
> > From: Ted Yu <yu...@gmail.com>
> > To: dev@hbase.apache.org
> > Sent: Sunday, September 25, 2011 1:41 PM
> > Subject: Re: maintaining stable HBase build
> >
> > As of 1:38 PST Sunday, the three builds all passed.
> >
> > I think we have some tests that exhibit in-deterministic behavior.
> >
> > I suggest committers interleave patch submissions by 2 hour span 
> so that we
> > can more easily identify patch(es) that break the build.
> >
> > Thanks
> >
> > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > I wrote a short blog:
> > > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-
> submission.html> >
> > > It is geared towards contributors.
> > >
> > > Cheers
> > >
> > >
> > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 
> 00902313 <
> > > ramakrishnas@huawei.com> wrote:
> > >
> > >> Hi
> > >>
> > >> Ted, I agree with you.  Pasting the testcase results in JIRA 
> is also
> > fine,
> > >> mainly when there are some testcase failures when we run 
> locally but if
> > we
> > >> feel it is not due to the fix we have added we can mention 
> that also.  I
> > >> think rather than in a windows machine its better to run in 
> linux box.
> > >>
> > >> +1 for your suggestion Ted.
> > >>
> > >> Can we add the feature like in HDFS when we submit patch 
> automatically> the
> > >> Jenkin's run the testcases?
> > >>
> > >> Atleast till this is done I go with your suggestion.
> > >>
> > >> Regards
> > >> Ram
> > >>
> > >> ----- Original Message -----
> > >> From: Ted Yu <yu...@gmail.com>
> > >> Date: Saturday, September 24, 2011 4:22 pm
> > >> Subject: maintaining stable HBase build
> > >> To: dev@hbase.apache.org
> > >>
> > >> > Hi,
> > >> > I want to bring the importance of maintaining stable HBase 
> build to
> > >> > ourattention.
> > >> > A stable HBase build is important, not just for the next 
> release> >> > but also
> > >> > for authors of the pending patches to verify the correctness of
> > >> > their work.
> > >> >
> > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK 
> builds> >> > were all
> > >> > blue. Now they're all red.
> > >> >
> > >> > I don't mind fixing Jenkins build. But if we collectively adopt
> > >> > some good
> > >> > practice, it would be easier to achieve the goal of having 
> stable> >> > builds.
> > >> > For contributors, I understand that it takes so much time to 
> run> >> > whole test
> > >> > suite that he/she may not have the luxury of doing this - 
> Apache> >> > Jenkinswouldn't do it when you press Submit Patch button.
> > >> > If this is the case (let's call it scenario A), please use 
> Eclipse> >> > (or other
> > >> > tool) to identify tests that exercise the classes/methods in 
> your> >> > patch and
> > >> > run them. Also clearly state what tests you ran in the JIRA.
> > >> >
> > >> > If you have a Linux box where you can run whole test suite, it
> > >> > would be nice
> > >> > to utilize such resource and run whole suite. Then please state
> > >> > this fact on
> > >> > the JIRA as well.
> > >> > Considering Todd's suggestion of holding off commit for 24 
> hours> >> > after code
> > >> > review, 2 hour test run isn't that long.
> > >> >
> > >> > Sometimes you may see the following (from 0.92 build 18):
> > >> >
> > >> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
> > >> >
> > >> > [INFO] ------------------------------------------------------
> -------
> > >> > -----------
> > >> > [INFO] BUILD FAILURE
> > >> > [INFO] ------------------------------------------------------
> -------
> > >> > -----------
> > >> > [INFO] Total time: 1:51:41.797s
> > >> >
> > >> > You should examine the test summary above these lines and 
> find out
> > >> > which test(s) hung. For this case it was TestMasterFailover:
> > >> >
> > >> > Running org.apache.hadoop.hbase.master.TestMasterFailover
> > >> > Running
> > >> >
> > 
> org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests> >> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
> > >> >
> > >> > I think a script should be developed that parses test output 
> and> >> > identify hanging test(s).
> > >> >
> > >> > For scenario A, I hope committer would run test suite.
> > >> > The net effect would be a statement on the JIRA, saying all 
> tests> >> > passed.
> > >> > Your comments/suggestions are welcome.
> > >> >
> > >>
> > >
> > >
> >
> 
> 
> 
> -- 
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com
> 

Re: maintaining stable HBase build

Posted by Jonathan Hsieh <jo...@cloudera.com>.
I've been hunting some flaky tests down as well -- a few weeks back I was
testing some changes along the line of HBASE-4326.  (maybe some of these are
fixed?)

First, two test seemed to flake fairly frequently and were likely problems
internal to the tests (TestReplication, TestMasterFailover).

There is a second set of tests that after applying a draft of HBASE-4326,
seems to moves to a different set of tests.  I'm pretty convinced there are
some cross test problems with these. This was on an 0.90.4 based branch, and
by now several more changes have gone in.  I'm getting back to HBASE-4326
and will try to get more stats on this.

Alternately, I exclude tests that I identify as flaky and exclude them from
the test run and have a separate test run that only runs the flaky
tests. The hooks for the excludes build is in the hbase pom but only works
with maven surefire 2.6 or 2.10 when it comes out.  (there is a bug in
surefire).  See this jira for more details.
http://jira.codehaus.org/browse/SUREFIRE-766

Jon.

On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <lh...@yahoo.com> wrote:

> At Salesforce we call these "flappers" and they are considered almost worse
> than failing tests,
> as they add noise to a test run without adding confidence.
> At test that fails once in - say - 10 runs is worthless.
>
>
>
> ________________________________
> From: Ted Yu <yu...@gmail.com>
> To: dev@hbase.apache.org
> Sent: Sunday, September 25, 2011 1:41 PM
> Subject: Re: maintaining stable HBase build
>
> As of 1:38 PST Sunday, the three builds all passed.
>
> I think we have some tests that exhibit in-deterministic behavior.
>
> I suggest committers interleave patch submissions by 2 hour span so that we
> can more easily identify patch(es) that break the build.
>
> Thanks
>
> On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > I wrote a short blog:
> > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html
> >
> > It is geared towards contributors.
> >
> > Cheers
> >
> >
> > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 <
> > ramakrishnas@huawei.com> wrote:
> >
> >> Hi
> >>
> >> Ted, I agree with you.  Pasting the testcase results in JIRA is also
> fine,
> >> mainly when there are some testcase failures when we run locally but if
> we
> >> feel it is not due to the fix we have added we can mention that also.  I
> >> think rather than in a windows machine its better to run in linux box.
> >>
> >> +1 for your suggestion Ted.
> >>
> >> Can we add the feature like in HDFS when we submit patch automatically
> the
> >> Jenkin's run the testcases?
> >>
> >> Atleast till this is done I go with your suggestion.
> >>
> >> Regards
> >> Ram
> >>
> >> ----- Original Message -----
> >> From: Ted Yu <yu...@gmail.com>
> >> Date: Saturday, September 24, 2011 4:22 pm
> >> Subject: maintaining stable HBase build
> >> To: dev@hbase.apache.org
> >>
> >> > Hi,
> >> > I want to bring the importance of maintaining stable HBase build to
> >> > ourattention.
> >> > A stable HBase build is important, not just for the next release
> >> > but also
> >> > for authors of the pending patches to verify the correctness of
> >> > their work.
> >> >
> >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds
> >> > were all
> >> > blue. Now they're all red.
> >> >
> >> > I don't mind fixing Jenkins build. But if we collectively adopt
> >> > some good
> >> > practice, it would be easier to achieve the goal of having stable
> >> > builds.
> >> > For contributors, I understand that it takes so much time to run
> >> > whole test
> >> > suite that he/she may not have the luxury of doing this - Apache
> >> > Jenkinswouldn't do it when you press Submit Patch button.
> >> > If this is the case (let's call it scenario A), please use Eclipse
> >> > (or other
> >> > tool) to identify tests that exercise the classes/methods in your
> >> > patch and
> >> > run them. Also clearly state what tests you ran in the JIRA.
> >> >
> >> > If you have a Linux box where you can run whole test suite, it
> >> > would be nice
> >> > to utilize such resource and run whole suite. Then please state
> >> > this fact on
> >> > the JIRA as well.
> >> > Considering Todd's suggestion of holding off commit for 24 hours
> >> > after code
> >> > review, 2 hour test run isn't that long.
> >> >
> >> > Sometimes you may see the following (from 0.92 build 18):
> >> >
> >> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
> >> >
> >> > [INFO] -------------------------------------------------------------
> >> > -----------
> >> > [INFO] BUILD FAILURE
> >> > [INFO] -------------------------------------------------------------
> >> > -----------
> >> > [INFO] Total time: 1:51:41.797s
> >> >
> >> > You should examine the test summary above these lines and find out
> >> > which test(s) hung. For this case it was TestMasterFailover:
> >> >
> >> > Running org.apache.hadoop.hbase.master.TestMasterFailover
> >> > Running
> >> >
> org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests
> >> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
> >> >
> >> > I think a script should be developed that parses test output and
> >> > identify hanging test(s).
> >> >
> >> > For scenario A, I hope committer would run test suite.
> >> > The net effect would be a statement on the JIRA, saying all tests
> >> > passed.
> >> > Your comments/suggestions are welcome.
> >> >
> >>
> >
> >
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: maintaining stable HBase build

Posted by Jonathan Hsieh <jo...@cloudera.com>.
These are the flaky tests I've seen failing regularly from some time before
the around the 0.90.4 time to now. (we have a few backports that probably
shouldn't affect these tests) :

TestReplication.queueFailover
TestMasterFailover.testMasterFailoverWithMockedRIT
TestMasterFailover.testSimpleMasterFailover
TestRollingRestart.testBasicRollingRestart

I also tried backporting HBASE-4453 but it the replication queueFailover
test still fails occasionally.

These are being run on machines with several builds of other projects
running on them.

Jon

On Mon, Sep 26, 2011 at 3:19 PM, Andrew Purtell <ap...@apache.org> wrote:

> A slow host (or busy running tests from other projects concurrently...) can
> cause failures in the replication tests.
>
>    - Andy
>
>
> ----- Original Message -----
> > From: Ted Yu <yu...@gmail.com>
> > To: lars hofhansl <lh...@yahoo.com>
> > Cc: "dev@hbase.apache.org" <de...@hbase.apache.org>
> > Sent: Monday, September 26, 2011 12:57 PM
> > Subject: Re: maintaining stable HBase build
> >
> > From TRUNK build 2259:
> >
> > Failed tests:   queueFailover(org.apache.
> > hadoop.hbase.replication.TestReplication): Waited too much time for
> > queueFailover replication
> >
> > I know Doug's change wouldn't have caused the above failure.
> >
> > FYI
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: maintaining stable HBase build

Posted by Andrew Purtell <ap...@apache.org>.
A slow host (or busy running tests from other projects concurrently...) can cause failures in the replication tests.

   - Andy


----- Original Message -----
> From: Ted Yu <yu...@gmail.com>
> To: lars hofhansl <lh...@yahoo.com>
> Cc: "dev@hbase.apache.org" <de...@hbase.apache.org>
> Sent: Monday, September 26, 2011 12:57 PM
> Subject: Re: maintaining stable HBase build
> 
> From TRUNK build 2259:
> 
> Failed tests:   queueFailover(org.apache.
> hadoop.hbase.replication.TestReplication): Waited too much time for
> queueFailover replication
> 
> I know Doug's change wouldn't have caused the above failure.
> 
> FYI

Re: maintaining stable HBase build

Posted by Ted Yu <yu...@gmail.com>.
>From TRUNK build 2259:

Failed tests:   queueFailover(org.apache.
hadoop.hbase.replication.TestReplication): Waited too much time for
queueFailover replication

I know Doug's change wouldn't have caused the above failure.

FYI

On Mon, Sep 26, 2011 at 10:45 AM, lars hofhansl <lh...@yahoo.com> wrote:

> I was thinking more along the lines:
> Either fix the test to not flap, or remove it.
>
> The first task would be to identify all tests that frequently show
> non-deterministic results.
>
> ------------------------------
> *From:* Ted Yu <yu...@gmail.com>
> *To:* dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
> *Sent:* Monday, September 26, 2011 2:08 AM
>
> *Subject:* Re: maintaining stable HBase build
>
> Below is a simple script to repeatedly run a unit test.
> I suggest using it or similar script on the new unit test(s) in future
> patches.
>
> #!/bin/bash
> # script to run test repeatedly
> # usage: ./runtest.sh <name of test> <number of repetitions>
> #
> for ((  i = 1 ;  i <= $2; i++  ))
> do
>   nice -10 mvn test -Dtest=$1
>   if [ $? -ne 0 ]; then
>     echo "$1 failed"
>     exit 1
>   fi
> done
>
> Thanks
>
> On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <lh...@yahoo.com>wrote:
>
> At Salesforce we call these "flappers" and they are considered almost worse
> than failing tests,
> as they add noise to a test run without adding confidence.
> At test that fails once in - say - 10 runs is worthless.
>
>
>
> ________________________________
> From: Ted Yu <yu...@gmail.com>
> To: dev@hbase.apache.org
> Sent: Sunday, September 25, 2011 1:41 PM
> Subject: Re: maintaining stable HBase build
>
> As of 1:38 PST Sunday, the three builds all passed.
>
> I think we have some tests that exhibit in-deterministic behavior.
>
> I suggest committers interleave patch submissions by 2 hour span so that we
> can more easily identify patch(es) that break the build.
>
> Thanks
>
> On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > I wrote a short blog:
> > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html
> >
> > It is geared towards contributors.
> >
> > Cheers
> >
> >
> > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 <
> > ramakrishnas@huawei.com> wrote:
> >
> >> Hi
> >>
> >> Ted, I agree with you.  Pasting the testcase results in JIRA is also
> fine,
> >> mainly when there are some testcase failures when we run locally but if
> we
> >> feel it is not due to the fix we have added we can mention that also.  I
> >> think rather than in a windows machine its better to run in linux box.
> >>
> >> +1 for your suggestion Ted.
> >>
> >> Can we add the feature like in HDFS when we submit patch automatically
> the
> >> Jenkin's run the testcases?
> >>
> >> Atleast till this is done I go with your suggestion.
> >>
> >> Regards
> >> Ram
> >>
> >> ----- Original Message -----
> >> From: Ted Yu <yu...@gmail.com>
> >> Date: Saturday, September 24, 2011 4:22 pm
> >> Subject: maintaining stable HBase build
> >> To: dev@hbase.apache.org
> >>
> >> > Hi,
> >> > I want to bring the importance of maintaining stable HBase build to
> >> > ourattention.
> >> > A stable HBase build is important, not just for the next release
> >> > but also
> >> > for authors of the pending patches to verify the correctness of
> >> > their work.
> >> >
> >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds
> >> > were all
> >> > blue. Now they're all red.
> >> >
> >> > I don't mind fixing Jenkins build. But if we collectively adopt
> >> > some good
> >> > practice, it would be easier to achieve the goal of having stable
> >> > builds.
> >> > For contributors, I understand that it takes so much time to run
> >> > whole test
> >> > suite that he/she may not have the luxury of doing this - Apache
> >> > Jenkinswouldn't do it when you press Submit Patch button.
> >> > If this is the case (let's call it scenario A), please use Eclipse
> >> > (or other
> >> > tool) to identify tests that exercise the classes/methods in your
> >> > patch and
> >> > run them. Also clearly state what tests you ran in the JIRA.
> >> >
> >> > If you have a Linux box where you can run whole test suite, it
> >> > would be nice
> >> > to utilize such resource and run whole suite. Then please state
> >> > this fact on
> >> > the JIRA as well.
> >> > Considering Todd's suggestion of holding off commit for 24 hours
> >> > after code
> >> > review, 2 hour test run isn't that long.
> >> >
> >> > Sometimes you may see the following (from 0.92 build 18):
> >> >
> >> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
> >> >
> >> > [INFO] -------------------------------------------------------------
> >> > -----------
> >> > [INFO] BUILD FAILURE
> >> > [INFO] -------------------------------------------------------------
> >> > -----------
> >> > [INFO] Total time: 1:51:41.797s
> >> >
> >> > You should examine the test summary above these lines and find out
> >> > which test(s) hung. For this case it was TestMasterFailover:
> >> >
> >> > Running org.apache.hadoop.hbase.master.TestMasterFailover
> >> > Running
> >> >
> org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests
> >> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
> >> >
> >> > I think a script should be developed that parses test output and
> >> > identify hanging test(s).
> >> >
> >> > For scenario A, I hope committer would run test suite.
> >> > The net effect would be a statement on the JIRA, saying all tests
> >> > passed.
> >> > Your comments/suggestions are welcome.
> >> >
> >>
> >
> >
>
>
>
>
>

Re: maintaining stable HBase build

Posted by lars hofhansl <lh...@yahoo.com>.
I was thinking more along the lines:
Either fix the test to not flap, or remove it.


The first task would be to identify all tests that frequently show non-deterministic results.



________________________________
From: Ted Yu <yu...@gmail.com>
To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Sent: Monday, September 26, 2011 2:08 AM
Subject: Re: maintaining stable HBase build


Below is a simple script to repeatedly run a unit test.
I suggest using it or similar script on the new unit test(s) in future patches.

#!/bin/bash
# script to run test repeatedly
# usage: ./runtest.sh <name of test> <number of repetitions>
#
for ((  i = 1 ;  i <= $2; i++  ))
do
  nice -10 mvn test -Dtest=$1
  if [ $? -ne 0 ]; then
    echo "$1 failed"
    exit 1
  fi
done

Thanks


On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <lh...@yahoo.com> wrote:

At Salesforce we call these "flappers" and they are considered almost worse than failing tests,
>as they add noise to a test run without adding confidence.
>At test that fails once in - say - 10 runs is worthless.
>
>
>
>________________________________
>
>From: Ted Yu <yu...@gmail.com>
>
>To: dev@hbase.apache.org
>Sent: Sunday, September 25, 2011 1:41 PM
>
>Subject: Re: maintaining stable HBase build
>
>
>As of 1:38 PST Sunday, the three builds all passed.
>
>I think we have some tests that exhibit in-deterministic behavior.
>
>I suggest committers interleave patch submissions by 2 hour span so that we
>can more easily identify patch(es) that break the build.
>
>Thanks
>
>On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> I wrote a short blog:
>> http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html
>>
>> It is geared towards contributors.
>>
>> Cheers
>>
>>
>> On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 <
>> ramakrishnas@huawei.com> wrote:
>>
>>> Hi
>>>
>>> Ted, I agree with you.  Pasting the testcase results in JIRA is also fine,
>>> mainly when there are some testcase failures when we run locally but if we
>>> feel it is not due to the fix we have added we can mention that also.  I
>>> think rather than in a windows machine its better to run in linux box.
>>>
>>> +1 for your suggestion Ted.
>>>
>>> Can we add the feature like in HDFS when we submit patch automatically the
>>> Jenkin's run the testcases?
>>>
>>> Atleast till this is done I go with your suggestion.
>>>
>>> Regards
>>> Ram
>>>
>>> ----- Original Message -----
>>> From: Ted Yu <yu...@gmail.com>
>>> Date: Saturday, September 24, 2011 4:22 pm
>>> Subject: maintaining stable HBase build
>>> To: dev@hbase.apache.org
>>>
>>> > Hi,
>>> > I want to bring the importance of maintaining stable HBase build to
>>> > ourattention.
>>> > A stable HBase build is important, not just for the next release
>>> > but also
>>> > for authors of the pending patches to verify the correctness of
>>> > their work.
>>> >
>>> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds
>>> > were all
>>> > blue. Now they're all red.
>>> >
>>> > I don't mind fixing Jenkins build. But if we collectively adopt
>>> > some good
>>> > practice, it would be easier to achieve the goal of having stable
>>> > builds.
>>> > For contributors, I understand that it takes so much time to run
>>> > whole test
>>> > suite that he/she may not have the luxury of doing this - Apache
>>> > Jenkinswouldn't do it when you press Submit Patch button.
>>> > If this is the case (let's call it scenario A), please use Eclipse
>>> > (or other
>>> > tool) to identify tests that exercise the classes/methods in your
>>> > patch and
>>> > run them. Also clearly state what tests you ran in the JIRA.
>>> >
>>> > If you have a Linux box where you can run whole test suite, it
>>> > would be nice
>>> > to utilize such resource and run whole suite. Then please state
>>> > this fact on
>>> > the JIRA as well.
>>> > Considering Todd's suggestion of holding off commit for 24 hours
>>> > after code
>>> > review, 2 hour test run isn't that long.
>>> >
>>> > Sometimes you may see the following (from 0.92 build 18):
>>> >
>>> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
>>> >
>>> > [INFO] -------------------------------------------------------------
>>> > -----------
>>> > [INFO] BUILD FAILURE
>>> > [INFO] -------------------------------------------------------------
>>> > -----------
>>> > [INFO] Total time: 1:51:41.797s
>>> >
>>> > You should examine the test summary above these lines and find out
>>> > which test(s) hung. For this case it was TestMasterFailover:
>>> >
>>> > Running org.apache.hadoop.hbase.master.TestMasterFailover
>>> > Running
>>> > org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests
>>> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
>>> >
>>> > I think a script should be developed that parses test output and
>>> > identify hanging test(s).
>>> >
>>> > For scenario A, I hope committer would run test suite.
>>> > The net effect would be a statement on the JIRA, saying all tests
>>> > passed.
>>> > Your comments/suggestions are welcome.
>>> >
>>>
>>
>>

Re: maintaining stable HBase build

Posted by Ted Yu <yu...@gmail.com>.
That would be nice Jesse.

Thanks

On Mon, Sep 26, 2011 at 10:16 AM, Jesse Yates <je...@gmail.com>wrote:

> Ted,
>
> There is a ticket (HBASE-4480) up for wrapping tests in a retry script for
> failed tests (though no work has been done on it yet). Maybe we can
> incorporate this script into that ticket?
>
> -Jesse Yates
>
> On Mon, Sep 26, 2011 at 2:08 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Below is a simple script to repeatedly run a unit test.
> > I suggest using it or similar script on the new unit test(s) in future
> > patches.
> >
> > #!/bin/bash
> > # script to run test repeatedly
> > # usage: ./runtest.sh <name of test> <number of repetitions>
> > #
> > for ((  i = 1 ;  i <= $2; i++  ))
> > do
> >  nice -10 mvn test -Dtest=$1
> >  if [ $? -ne 0 ]; then
> >    echo "$1 failed"
> >    exit 1
> >  fi
> > done
> >
> > Thanks
> >
> > On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <lh...@yahoo.com>
> > wrote:
> >
> > > At Salesforce we call these "flappers" and they are considered almost
> > worse
> > > than failing tests,
> > > as they add noise to a test run without adding confidence.
> > > At test that fails once in - say - 10 runs is worthless.
> > >
> > >
> > >
> > > ________________________________
> > > From: Ted Yu <yu...@gmail.com>
> > > To: dev@hbase.apache.org
> > > Sent: Sunday, September 25, 2011 1:41 PM
> > > Subject: Re: maintaining stable HBase build
> > >
> > > As of 1:38 PST Sunday, the three builds all passed.
> > >
> > > I think we have some tests that exhibit in-deterministic behavior.
> > >
> > > I suggest committers interleave patch submissions by 2 hour span so
> that
> > we
> > > can more easily identify patch(es) that break the build.
> > >
> > > Thanks
> > >
> > > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > I wrote a short blog:
> > > >
> > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html
> > > >
> > > > It is geared towards contributors.
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 <
> > > > ramakrishnas@huawei.com> wrote:
> > > >
> > > >> Hi
> > > >>
> > > >> Ted, I agree with you.  Pasting the testcase results in JIRA is also
> > > fine,
> > > >> mainly when there are some testcase failures when we run locally but
> > if
> > > we
> > > >> feel it is not due to the fix we have added we can mention that
> also.
> >  I
> > > >> think rather than in a windows machine its better to run in linux
> box.
> > > >>
> > > >> +1 for your suggestion Ted.
> > > >>
> > > >> Can we add the feature like in HDFS when we submit patch
> automatically
> > > the
> > > >> Jenkin's run the testcases?
> > > >>
> > > >> Atleast till this is done I go with your suggestion.
> > > >>
> > > >> Regards
> > > >> Ram
> > > >>
> > > >> ----- Original Message -----
> > > >> From: Ted Yu <yu...@gmail.com>
> > > >> Date: Saturday, September 24, 2011 4:22 pm
> > > >> Subject: maintaining stable HBase build
> > > >> To: dev@hbase.apache.org
> > > >>
> > > >> > Hi,
> > > >> > I want to bring the importance of maintaining stable HBase build
> to
> > > >> > ourattention.
> > > >> > A stable HBase build is important, not just for the next release
> > > >> > but also
> > > >> > for authors of the pending patches to verify the correctness of
> > > >> > their work.
> > > >> >
> > > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds
> > > >> > were all
> > > >> > blue. Now they're all red.
> > > >> >
> > > >> > I don't mind fixing Jenkins build. But if we collectively adopt
> > > >> > some good
> > > >> > practice, it would be easier to achieve the goal of having stable
> > > >> > builds.
> > > >> > For contributors, I understand that it takes so much time to run
> > > >> > whole test
> > > >> > suite that he/she may not have the luxury of doing this - Apache
> > > >> > Jenkinswouldn't do it when you press Submit Patch button.
> > > >> > If this is the case (let's call it scenario A), please use Eclipse
> > > >> > (or other
> > > >> > tool) to identify tests that exercise the classes/methods in your
> > > >> > patch and
> > > >> > run them. Also clearly state what tests you ran in the JIRA.
> > > >> >
> > > >> > If you have a Linux box where you can run whole test suite, it
> > > >> > would be nice
> > > >> > to utilize such resource and run whole suite. Then please state
> > > >> > this fact on
> > > >> > the JIRA as well.
> > > >> > Considering Todd's suggestion of holding off commit for 24 hours
> > > >> > after code
> > > >> > review, 2 hour test run isn't that long.
> > > >> >
> > > >> > Sometimes you may see the following (from 0.92 build 18):
> > > >> >
> > > >> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
> > > >> >
> > > >> > [INFO]
> -------------------------------------------------------------
> > > >> > -----------
> > > >> > [INFO] BUILD FAILURE
> > > >> > [INFO]
> -------------------------------------------------------------
> > > >> > -----------
> > > >> > [INFO] Total time: 1:51:41.797s
> > > >> >
> > > >> > You should examine the test summary above these lines and find out
> > > >> > which test(s) hung. For this case it was TestMasterFailover:
> > > >> >
> > > >> > Running org.apache.hadoop.hbase.master.TestMasterFailover
> > > >> > Running
> > > >> >
> > >
> org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests
> > > >> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
> > > >> >
> > > >> > I think a script should be developed that parses test output and
> > > >> > identify hanging test(s).
> > > >> >
> > > >> > For scenario A, I hope committer would run test suite.
> > > >> > The net effect would be a statement on the JIRA, saying all tests
> > > >> > passed.
> > > >> > Your comments/suggestions are welcome.
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: maintaining stable HBase build

Posted by Jesse Yates <je...@gmail.com>.
Ted,

There is a ticket (HBASE-4480) up for wrapping tests in a retry script for
failed tests (though no work has been done on it yet). Maybe we can
incorporate this script into that ticket?

-Jesse Yates

On Mon, Sep 26, 2011 at 2:08 AM, Ted Yu <yu...@gmail.com> wrote:

> Below is a simple script to repeatedly run a unit test.
> I suggest using it or similar script on the new unit test(s) in future
> patches.
>
> #!/bin/bash
> # script to run test repeatedly
> # usage: ./runtest.sh <name of test> <number of repetitions>
> #
> for ((  i = 1 ;  i <= $2; i++  ))
> do
>  nice -10 mvn test -Dtest=$1
>  if [ $? -ne 0 ]; then
>    echo "$1 failed"
>    exit 1
>  fi
> done
>
> Thanks
>
> On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <lh...@yahoo.com>
> wrote:
>
> > At Salesforce we call these "flappers" and they are considered almost
> worse
> > than failing tests,
> > as they add noise to a test run without adding confidence.
> > At test that fails once in - say - 10 runs is worthless.
> >
> >
> >
> > ________________________________
> > From: Ted Yu <yu...@gmail.com>
> > To: dev@hbase.apache.org
> > Sent: Sunday, September 25, 2011 1:41 PM
> > Subject: Re: maintaining stable HBase build
> >
> > As of 1:38 PST Sunday, the three builds all passed.
> >
> > I think we have some tests that exhibit in-deterministic behavior.
> >
> > I suggest committers interleave patch submissions by 2 hour span so that
> we
> > can more easily identify patch(es) that break the build.
> >
> > Thanks
> >
> > On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > I wrote a short blog:
> > >
> http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html
> > >
> > > It is geared towards contributors.
> > >
> > > Cheers
> > >
> > >
> > > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 <
> > > ramakrishnas@huawei.com> wrote:
> > >
> > >> Hi
> > >>
> > >> Ted, I agree with you.  Pasting the testcase results in JIRA is also
> > fine,
> > >> mainly when there are some testcase failures when we run locally but
> if
> > we
> > >> feel it is not due to the fix we have added we can mention that also.
>  I
> > >> think rather than in a windows machine its better to run in linux box.
> > >>
> > >> +1 for your suggestion Ted.
> > >>
> > >> Can we add the feature like in HDFS when we submit patch automatically
> > the
> > >> Jenkin's run the testcases?
> > >>
> > >> Atleast till this is done I go with your suggestion.
> > >>
> > >> Regards
> > >> Ram
> > >>
> > >> ----- Original Message -----
> > >> From: Ted Yu <yu...@gmail.com>
> > >> Date: Saturday, September 24, 2011 4:22 pm
> > >> Subject: maintaining stable HBase build
> > >> To: dev@hbase.apache.org
> > >>
> > >> > Hi,
> > >> > I want to bring the importance of maintaining stable HBase build to
> > >> > ourattention.
> > >> > A stable HBase build is important, not just for the next release
> > >> > but also
> > >> > for authors of the pending patches to verify the correctness of
> > >> > their work.
> > >> >
> > >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds
> > >> > were all
> > >> > blue. Now they're all red.
> > >> >
> > >> > I don't mind fixing Jenkins build. But if we collectively adopt
> > >> > some good
> > >> > practice, it would be easier to achieve the goal of having stable
> > >> > builds.
> > >> > For contributors, I understand that it takes so much time to run
> > >> > whole test
> > >> > suite that he/she may not have the luxury of doing this - Apache
> > >> > Jenkinswouldn't do it when you press Submit Patch button.
> > >> > If this is the case (let's call it scenario A), please use Eclipse
> > >> > (or other
> > >> > tool) to identify tests that exercise the classes/methods in your
> > >> > patch and
> > >> > run them. Also clearly state what tests you ran in the JIRA.
> > >> >
> > >> > If you have a Linux box where you can run whole test suite, it
> > >> > would be nice
> > >> > to utilize such resource and run whole suite. Then please state
> > >> > this fact on
> > >> > the JIRA as well.
> > >> > Considering Todd's suggestion of holding off commit for 24 hours
> > >> > after code
> > >> > review, 2 hour test run isn't that long.
> > >> >
> > >> > Sometimes you may see the following (from 0.92 build 18):
> > >> >
> > >> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
> > >> >
> > >> > [INFO] -------------------------------------------------------------
> > >> > -----------
> > >> > [INFO] BUILD FAILURE
> > >> > [INFO] -------------------------------------------------------------
> > >> > -----------
> > >> > [INFO] Total time: 1:51:41.797s
> > >> >
> > >> > You should examine the test summary above these lines and find out
> > >> > which test(s) hung. For this case it was TestMasterFailover:
> > >> >
> > >> > Running org.apache.hadoop.hbase.master.TestMasterFailover
> > >> > Running
> > >> >
> > org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests
> > >> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
> > >> >
> > >> > I think a script should be developed that parses test output and
> > >> > identify hanging test(s).
> > >> >
> > >> > For scenario A, I hope committer would run test suite.
> > >> > The net effect would be a statement on the JIRA, saying all tests
> > >> > passed.
> > >> > Your comments/suggestions are welcome.
> > >> >
> > >>
> > >
> > >
> >
>

Re: maintaining stable HBase build

Posted by lars hofhansl <lh...@yahoo.com>.
Or if you wanted to the run a test with a known problem until it fails:

while mvn test -Dtest=<test>; do echo "Succeeded, running again"; done


I sometimes run this at night for tests with rare race conditions.
________________________________
From: Ted Yu <yu...@gmail.com>
To: dev@hbase.apache.org; lars hofhansl <lh...@yahoo.com>
Sent: Monday, September 26, 2011 2:08 AM
Subject: Re: maintaining stable HBase build

Below is a simple script to repeatedly run a unit test.
I suggest using it or similar script on the new unit test(s) in future
patches.

#!/bin/bash
# script to run test repeatedly
# usage: ./runtest.sh <name of test> <number of repetitions>
#
for ((  i = 1 ;  i <= $2; i++  ))
do
  nice -10 mvn test -Dtest=$1
  if [ $? -ne 0 ]; then
    echo "$1 failed"
    exit 1
  fi
done

Thanks

On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <lh...@yahoo.com> wrote:

> At Salesforce we call these "flappers" and they are considered almost worse
> than failing tests,
> as they add noise to a test run without adding confidence.
> At test that fails once in - say - 10 runs is worthless.
>
>
>
> ________________________________
> From: Ted Yu <yu...@gmail.com>
> To: dev@hbase.apache.org
> Sent: Sunday, September 25, 2011 1:41 PM
> Subject: Re: maintaining stable HBase build
>
> As of 1:38 PST Sunday, the three builds all passed.
>
> I think we have some tests that exhibit in-deterministic behavior.
>
> I suggest committers interleave patch submissions by 2 hour span so that we
> can more easily identify patch(es) that break the build.
>
> Thanks
>
> On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > I wrote a short blog:
> > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html
> >
> > It is geared towards contributors.
> >
> > Cheers
> >
> >
> > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 <
> > ramakrishnas@huawei.com> wrote:
> >
> >> Hi
> >>
> >> Ted, I agree with you.  Pasting the testcase results in JIRA is also
> fine,
> >> mainly when there are some testcase failures when we run locally but if
> we
> >> feel it is not due to the fix we have added we can mention that also.  I
> >> think rather than in a windows machine its better to run in linux box.
> >>
> >> +1 for your suggestion Ted.
> >>
> >> Can we add the feature like in HDFS when we submit patch automatically
> the
> >> Jenkin's run the testcases?
> >>
> >> Atleast till this is done I go with your suggestion.
> >>
> >> Regards
> >> Ram
> >>
> >> ----- Original Message -----
> >> From: Ted Yu <yu...@gmail.com>
> >> Date: Saturday, September 24, 2011 4:22 pm
> >> Subject: maintaining stable HBase build
> >> To: dev@hbase.apache.org
> >>
> >> > Hi,
> >> > I want to bring the importance of maintaining stable HBase build to
> >> > ourattention.
> >> > A stable HBase build is important, not just for the next release
> >> > but also
> >> > for authors of the pending patches to verify the correctness of
> >> > their work.
> >> >
> >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds
> >> > were all
> >> > blue. Now they're all red.
> >> >
> >> > I don't mind fixing Jenkins build. But if we collectively adopt
> >> > some good
> >> > practice, it would be easier to achieve the goal of having stable
> >> > builds.
> >> > For contributors, I understand that it takes so much time to run
> >> > whole test
> >> > suite that he/she may not have the luxury of doing this - Apache
> >> > Jenkinswouldn't do it when you press Submit Patch button.
> >> > If this is the case (let's call it scenario A), please use Eclipse
> >> > (or other
> >> > tool) to identify tests that exercise the classes/methods in your
> >> > patch and
> >> > run them. Also clearly state what tests you ran in the JIRA.
> >> >
> >> > If you have a Linux box where you can run whole test suite, it
> >> > would be nice
> >> > to utilize such resource and run whole suite. Then please state
> >> > this fact on
> >> > the JIRA as well.
> >> > Considering Todd's suggestion of holding off commit for 24 hours
> >> > after code
> >> > review, 2 hour test run isn't that long.
> >> >
> >> > Sometimes you may see the following (from 0.92 build 18):
> >> >
> >> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
> >> >
> >> > [INFO] -------------------------------------------------------------
> >> > -----------
> >> > [INFO] BUILD FAILURE
> >> > [INFO] -------------------------------------------------------------
> >> > -----------
> >> > [INFO] Total time: 1:51:41.797s
> >> >
> >> > You should examine the test summary above these lines and find out
> >> > which test(s) hung. For this case it was TestMasterFailover:
> >> >
> >> > Running org.apache.hadoop.hbase.master.TestMasterFailover
> >> > Running
> >> >
> org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests
> >> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
> >> >
> >> > I think a script should be developed that parses test output and
> >> > identify hanging test(s).
> >> >
> >> > For scenario A, I hope committer would run test suite.
> >> > The net effect would be a statement on the JIRA, saying all tests
> >> > passed.
> >> > Your comments/suggestions are welcome.
> >> >
> >>
> >
> >
>

Re: maintaining stable HBase build

Posted by Ted Yu <yu...@gmail.com>.
Below is a simple script to repeatedly run a unit test.
I suggest using it or similar script on the new unit test(s) in future
patches.

#!/bin/bash
# script to run test repeatedly
# usage: ./runtest.sh <name of test> <number of repetitions>
#
for ((  i = 1 ;  i <= $2; i++  ))
do
  nice -10 mvn test -Dtest=$1
  if [ $? -ne 0 ]; then
    echo "$1 failed"
    exit 1
  fi
done

Thanks

On Sun, Sep 25, 2011 at 2:27 PM, lars hofhansl <lh...@yahoo.com> wrote:

> At Salesforce we call these "flappers" and they are considered almost worse
> than failing tests,
> as they add noise to a test run without adding confidence.
> At test that fails once in - say - 10 runs is worthless.
>
>
>
> ________________________________
> From: Ted Yu <yu...@gmail.com>
> To: dev@hbase.apache.org
> Sent: Sunday, September 25, 2011 1:41 PM
> Subject: Re: maintaining stable HBase build
>
> As of 1:38 PST Sunday, the three builds all passed.
>
> I think we have some tests that exhibit in-deterministic behavior.
>
> I suggest committers interleave patch submissions by 2 hour span so that we
> can more easily identify patch(es) that break the build.
>
> Thanks
>
> On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > I wrote a short blog:
> > http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html
> >
> > It is geared towards contributors.
> >
> > Cheers
> >
> >
> > On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 <
> > ramakrishnas@huawei.com> wrote:
> >
> >> Hi
> >>
> >> Ted, I agree with you.  Pasting the testcase results in JIRA is also
> fine,
> >> mainly when there are some testcase failures when we run locally but if
> we
> >> feel it is not due to the fix we have added we can mention that also.  I
> >> think rather than in a windows machine its better to run in linux box.
> >>
> >> +1 for your suggestion Ted.
> >>
> >> Can we add the feature like in HDFS when we submit patch automatically
> the
> >> Jenkin's run the testcases?
> >>
> >> Atleast till this is done I go with your suggestion.
> >>
> >> Regards
> >> Ram
> >>
> >> ----- Original Message -----
> >> From: Ted Yu <yu...@gmail.com>
> >> Date: Saturday, September 24, 2011 4:22 pm
> >> Subject: maintaining stable HBase build
> >> To: dev@hbase.apache.org
> >>
> >> > Hi,
> >> > I want to bring the importance of maintaining stable HBase build to
> >> > ourattention.
> >> > A stable HBase build is important, not just for the next release
> >> > but also
> >> > for authors of the pending patches to verify the correctness of
> >> > their work.
> >> >
> >> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds
> >> > were all
> >> > blue. Now they're all red.
> >> >
> >> > I don't mind fixing Jenkins build. But if we collectively adopt
> >> > some good
> >> > practice, it would be easier to achieve the goal of having stable
> >> > builds.
> >> > For contributors, I understand that it takes so much time to run
> >> > whole test
> >> > suite that he/she may not have the luxury of doing this - Apache
> >> > Jenkinswouldn't do it when you press Submit Patch button.
> >> > If this is the case (let's call it scenario A), please use Eclipse
> >> > (or other
> >> > tool) to identify tests that exercise the classes/methods in your
> >> > patch and
> >> > run them. Also clearly state what tests you ran in the JIRA.
> >> >
> >> > If you have a Linux box where you can run whole test suite, it
> >> > would be nice
> >> > to utilize such resource and run whole suite. Then please state
> >> > this fact on
> >> > the JIRA as well.
> >> > Considering Todd's suggestion of holding off commit for 24 hours
> >> > after code
> >> > review, 2 hour test run isn't that long.
> >> >
> >> > Sometimes you may see the following (from 0.92 build 18):
> >> >
> >> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
> >> >
> >> > [INFO] -------------------------------------------------------------
> >> > -----------
> >> > [INFO] BUILD FAILURE
> >> > [INFO] -------------------------------------------------------------
> >> > -----------
> >> > [INFO] Total time: 1:51:41.797s
> >> >
> >> > You should examine the test summary above these lines and find out
> >> > which test(s) hung. For this case it was TestMasterFailover:
> >> >
> >> > Running org.apache.hadoop.hbase.master.TestMasterFailover
> >> > Running
> >> >
> org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests
> >> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
> >> >
> >> > I think a script should be developed that parses test output and
> >> > identify hanging test(s).
> >> >
> >> > For scenario A, I hope committer would run test suite.
> >> > The net effect would be a statement on the JIRA, saying all tests
> >> > passed.
> >> > Your comments/suggestions are welcome.
> >> >
> >>
> >
> >
>

Re: maintaining stable HBase build

Posted by lars hofhansl <lh...@yahoo.com>.
At Salesforce we call these "flappers" and they are considered almost worse than failing tests,
as they add noise to a test run without adding confidence.
At test that fails once in - say - 10 runs is worthless.



________________________________
From: Ted Yu <yu...@gmail.com>
To: dev@hbase.apache.org
Sent: Sunday, September 25, 2011 1:41 PM
Subject: Re: maintaining stable HBase build

As of 1:38 PST Sunday, the three builds all passed.

I think we have some tests that exhibit in-deterministic behavior.

I suggest committers interleave patch submissions by 2 hour span so that we
can more easily identify patch(es) that break the build.

Thanks

On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <yu...@gmail.com> wrote:

> I wrote a short blog:
> http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html
>
> It is geared towards contributors.
>
> Cheers
>
>
> On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 <
> ramakrishnas@huawei.com> wrote:
>
>> Hi
>>
>> Ted, I agree with you.  Pasting the testcase results in JIRA is also fine,
>> mainly when there are some testcase failures when we run locally but if we
>> feel it is not due to the fix we have added we can mention that also.  I
>> think rather than in a windows machine its better to run in linux box.
>>
>> +1 for your suggestion Ted.
>>
>> Can we add the feature like in HDFS when we submit patch automatically the
>> Jenkin's run the testcases?
>>
>> Atleast till this is done I go with your suggestion.
>>
>> Regards
>> Ram
>>
>> ----- Original Message -----
>> From: Ted Yu <yu...@gmail.com>
>> Date: Saturday, September 24, 2011 4:22 pm
>> Subject: maintaining stable HBase build
>> To: dev@hbase.apache.org
>>
>> > Hi,
>> > I want to bring the importance of maintaining stable HBase build to
>> > ourattention.
>> > A stable HBase build is important, not just for the next release
>> > but also
>> > for authors of the pending patches to verify the correctness of
>> > their work.
>> >
>> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds
>> > were all
>> > blue. Now they're all red.
>> >
>> > I don't mind fixing Jenkins build. But if we collectively adopt
>> > some good
>> > practice, it would be easier to achieve the goal of having stable
>> > builds.
>> > For contributors, I understand that it takes so much time to run
>> > whole test
>> > suite that he/she may not have the luxury of doing this - Apache
>> > Jenkinswouldn't do it when you press Submit Patch button.
>> > If this is the case (let's call it scenario A), please use Eclipse
>> > (or other
>> > tool) to identify tests that exercise the classes/methods in your
>> > patch and
>> > run them. Also clearly state what tests you ran in the JIRA.
>> >
>> > If you have a Linux box where you can run whole test suite, it
>> > would be nice
>> > to utilize such resource and run whole suite. Then please state
>> > this fact on
>> > the JIRA as well.
>> > Considering Todd's suggestion of holding off commit for 24 hours
>> > after code
>> > review, 2 hour test run isn't that long.
>> >
>> > Sometimes you may see the following (from 0.92 build 18):
>> >
>> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
>> >
>> > [INFO] -------------------------------------------------------------
>> > -----------
>> > [INFO] BUILD FAILURE
>> > [INFO] -------------------------------------------------------------
>> > -----------
>> > [INFO] Total time: 1:51:41.797s
>> >
>> > You should examine the test summary above these lines and find out
>> > which test(s) hung. For this case it was TestMasterFailover:
>> >
>> > Running org.apache.hadoop.hbase.master.TestMasterFailover
>> > Running
>> > org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests
>> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
>> >
>> > I think a script should be developed that parses test output and
>> > identify hanging test(s).
>> >
>> > For scenario A, I hope committer would run test suite.
>> > The net effect would be a statement on the JIRA, saying all tests
>> > passed.
>> > Your comments/suggestions are welcome.
>> >
>>
>
>

Re: maintaining stable HBase build

Posted by Ted Yu <yu...@gmail.com>.
As of 1:38 PST Sunday, the three builds all passed.

I think we have some tests that exhibit in-deterministic behavior.

I suggest committers interleave patch submissions by 2 hour span so that we
can more easily identify patch(es) that break the build.

Thanks

On Sun, Sep 25, 2011 at 7:45 AM, Ted Yu <yu...@gmail.com> wrote:

> I wrote a short blog:
> http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html
>
> It is geared towards contributors.
>
> Cheers
>
>
> On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 <
> ramakrishnas@huawei.com> wrote:
>
>> Hi
>>
>> Ted, I agree with you.  Pasting the testcase results in JIRA is also fine,
>> mainly when there are some testcase failures when we run locally but if we
>> feel it is not due to the fix we have added we can mention that also.  I
>> think rather than in a windows machine its better to run in linux box.
>>
>> +1 for your suggestion Ted.
>>
>> Can we add the feature like in HDFS when we submit patch automatically the
>> Jenkin's run the testcases?
>>
>> Atleast till this is done I go with your suggestion.
>>
>> Regards
>> Ram
>>
>> ----- Original Message -----
>> From: Ted Yu <yu...@gmail.com>
>> Date: Saturday, September 24, 2011 4:22 pm
>> Subject: maintaining stable HBase build
>> To: dev@hbase.apache.org
>>
>> > Hi,
>> > I want to bring the importance of maintaining stable HBase build to
>> > ourattention.
>> > A stable HBase build is important, not just for the next release
>> > but also
>> > for authors of the pending patches to verify the correctness of
>> > their work.
>> >
>> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds
>> > were all
>> > blue. Now they're all red.
>> >
>> > I don't mind fixing Jenkins build. But if we collectively adopt
>> > some good
>> > practice, it would be easier to achieve the goal of having stable
>> > builds.
>> > For contributors, I understand that it takes so much time to run
>> > whole test
>> > suite that he/she may not have the luxury of doing this - Apache
>> > Jenkinswouldn't do it when you press Submit Patch button.
>> > If this is the case (let's call it scenario A), please use Eclipse
>> > (or other
>> > tool) to identify tests that exercise the classes/methods in your
>> > patch and
>> > run them. Also clearly state what tests you ran in the JIRA.
>> >
>> > If you have a Linux box where you can run whole test suite, it
>> > would be nice
>> > to utilize such resource and run whole suite. Then please state
>> > this fact on
>> > the JIRA as well.
>> > Considering Todd's suggestion of holding off commit for 24 hours
>> > after code
>> > review, 2 hour test run isn't that long.
>> >
>> > Sometimes you may see the following (from 0.92 build 18):
>> >
>> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
>> >
>> > [INFO] -------------------------------------------------------------
>> > -----------
>> > [INFO] BUILD FAILURE
>> > [INFO] -------------------------------------------------------------
>> > -----------
>> > [INFO] Total time: 1:51:41.797s
>> >
>> > You should examine the test summary above these lines and find out
>> > which test(s) hung. For this case it was TestMasterFailover:
>> >
>> > Running org.apache.hadoop.hbase.master.TestMasterFailover
>> > Running
>> > org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests
>> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
>> >
>> > I think a script should be developed that parses test output and
>> > identify hanging test(s).
>> >
>> > For scenario A, I hope committer would run test suite.
>> > The net effect would be a statement on the JIRA, saying all tests
>> > passed.
>> > Your comments/suggestions are welcome.
>> >
>>
>
>

Re: maintaining stable HBase build

Posted by Ted Yu <yu...@gmail.com>.
I wrote a short blog:
http://zhihongyu.blogspot.com/2011/09/streamlining-patch-submission.html

It is geared towards contributors.

Cheers

On Sat, Sep 24, 2011 at 9:16 AM, Ramakrishna S Vasudevan 00902313 <
ramakrishnas@huawei.com> wrote:

> Hi
>
> Ted, I agree with you.  Pasting the testcase results in JIRA is also fine,
> mainly when there are some testcase failures when we run locally but if we
> feel it is not due to the fix we have added we can mention that also.  I
> think rather than in a windows machine its better to run in linux box.
>
> +1 for your suggestion Ted.
>
> Can we add the feature like in HDFS when we submit patch automatically the
> Jenkin's run the testcases?
>
> Atleast till this is done I go with your suggestion.
>
> Regards
> Ram
>
> ----- Original Message -----
> From: Ted Yu <yu...@gmail.com>
> Date: Saturday, September 24, 2011 4:22 pm
> Subject: maintaining stable HBase build
> To: dev@hbase.apache.org
>
> > Hi,
> > I want to bring the importance of maintaining stable HBase build to
> > ourattention.
> > A stable HBase build is important, not just for the next release
> > but also
> > for authors of the pending patches to verify the correctness of
> > their work.
> >
> > At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds
> > were all
> > blue. Now they're all red.
> >
> > I don't mind fixing Jenkins build. But if we collectively adopt
> > some good
> > practice, it would be easier to achieve the goal of having stable
> > builds.
> > For contributors, I understand that it takes so much time to run
> > whole test
> > suite that he/she may not have the luxury of doing this - Apache
> > Jenkinswouldn't do it when you press Submit Patch button.
> > If this is the case (let's call it scenario A), please use Eclipse
> > (or other
> > tool) to identify tests that exercise the classes/methods in your
> > patch and
> > run them. Also clearly state what tests you ran in the JIRA.
> >
> > If you have a Linux box where you can run whole test suite, it
> > would be nice
> > to utilize such resource and run whole suite. Then please state
> > this fact on
> > the JIRA as well.
> > Considering Todd's suggestion of holding off commit for 24 hours
> > after code
> > review, 2 hour test run isn't that long.
> >
> > Sometimes you may see the following (from 0.92 build 18):
> >
> > Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
> >
> > [INFO] -------------------------------------------------------------
> > -----------
> > [INFO] BUILD FAILURE
> > [INFO] -------------------------------------------------------------
> > -----------
> > [INFO] Total time: 1:51:41.797s
> >
> > You should examine the test summary above these lines and find out
> > which test(s) hung. For this case it was TestMasterFailover:
> >
> > Running org.apache.hadoop.hbase.master.TestMasterFailover
> > Running
> > org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests
> run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
> >
> > I think a script should be developed that parses test output and
> > identify hanging test(s).
> >
> > For scenario A, I hope committer would run test suite.
> > The net effect would be a statement on the JIRA, saying all tests
> > passed.
> > Your comments/suggestions are welcome.
> >
>

Re: maintaining stable HBase build

Posted by Ramakrishna S Vasudevan 00902313 <ra...@huawei.com>.
Hi

Ted, I agree with you.  Pasting the testcase results in JIRA is also fine, mainly when there are some testcase failures when we run locally but if we feel it is not due to the fix we have added we can mention that also.  I think rather than in a windows machine its better to run in linux box.

+1 for your suggestion Ted.  

Can we add the feature like in HDFS when we submit patch automatically the Jenkin's run the testcases?

Atleast till this is done I go with your suggestion.

Regards
Ram

----- Original Message -----
From: Ted Yu <yu...@gmail.com>
Date: Saturday, September 24, 2011 4:22 pm
Subject: maintaining stable HBase build
To: dev@hbase.apache.org

> Hi,
> I want to bring the importance of maintaining stable HBase build to 
> ourattention.
> A stable HBase build is important, not just for the next release 
> but also
> for authors of the pending patches to verify the correctness of 
> their work.
> 
> At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds 
> were all
> blue. Now they're all red.
> 
> I don't mind fixing Jenkins build. But if we collectively adopt 
> some good
> practice, it would be easier to achieve the goal of having stable 
> builds.
> For contributors, I understand that it takes so much time to run 
> whole test
> suite that he/she may not have the luxury of doing this - Apache 
> Jenkinswouldn't do it when you press Submit Patch button.
> If this is the case (let's call it scenario A), please use Eclipse 
> (or other
> tool) to identify tests that exercise the classes/methods in your 
> patch and
> run them. Also clearly state what tests you ran in the JIRA.
> 
> If you have a Linux box where you can run whole test suite, it 
> would be nice
> to utilize such resource and run whole suite. Then please state 
> this fact on
> the JIRA as well.
> Considering Todd's suggestion of holding off commit for 24 hours 
> after code
> review, 2 hour test run isn't that long.
> 
> Sometimes you may see the following (from 0.92 build 18):
> 
> Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21
> 
> [INFO] -------------------------------------------------------------
> -----------
> [INFO] BUILD FAILURE
> [INFO] -------------------------------------------------------------
> -----------
> [INFO] Total time: 1:51:41.797s
> 
> You should examine the test summary above these lines and find out
> which test(s) hung. For this case it was TestMasterFailover:
> 
> Running org.apache.hadoop.hbase.master.TestMasterFailover
> Running 
> org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTableTests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec
> 
> I think a script should be developed that parses test output and
> identify hanging test(s).
> 
> For scenario A, I hope committer would run test suite.
> The net effect would be a statement on the JIRA, saying all tests 
> passed.
> Your comments/suggestions are welcome.
> 

Re: maintaining stable HBase build

Posted by Gaojinchao <ga...@huawei.com>.
+1. We should run all test cases before submit it, Then please state this fact on
the JIRA as well.

I will do it.

-----邮件原件-----
发件人: Ted Yu [mailto:yuzhihong@gmail.com] 
发送时间: 2011年9月24日 18:52
收件人: dev@hbase.apache.org
主题: maintaining stable HBase build

Hi,
I want to bring the importance of maintaining stable HBase build to our
attention.
A stable HBase build is important, not just for the next release but also
for authors of the pending patches to verify the correctness of their work.

At some time on Thursday (Sept 22nd) 0.90, 0.92 and TRUNK builds were all
blue. Now they're all red.

I don't mind fixing Jenkins build. But if we collectively adopt some good
practice, it would be easier to achieve the goal of having stable builds.

For contributors, I understand that it takes so much time to run whole test
suite that he/she may not have the luxury of doing this - Apache Jenkins
wouldn't do it when you press Submit Patch button.
If this is the case (let's call it scenario A), please use Eclipse (or other
tool) to identify tests that exercise the classes/methods in your patch and
run them. Also clearly state what tests you ran in the JIRA.

If you have a Linux box where you can run whole test suite, it would be nice
to utilize such resource and run whole suite. Then please state this fact on
the JIRA as well.
Considering Todd's suggestion of holding off commit for 24 hours after code
review, 2 hour test run isn't that long.

Sometimes you may see the following (from 0.92 build 18):

Tests run: 1004, Failures: 0, Errors: 0, Skipped: 21

[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:51:41.797s

You should examine the test summary above these lines and find out
which test(s) hung. For this case it was TestMasterFailover:

Running org.apache.hadoop.hbase.master.TestMasterFailover
Running org.apache.hadoop.hbase.master.TestMasterRestartAfterDisablingTable
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.265 sec

I think a script should be developed that parses test output and
identify hanging test(s).

For scenario A, I hope committer would run test suite.
The net effect would be a statement on the JIRA, saying all tests passed.

Your comments/suggestions are welcome.