You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2019/04/03 00:56:21 UTC

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Hello Mike Percy, Andrew Wong, Grant Henke,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/12917

to review the following change.


Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................

build: adapt new Java flaky test infrastructure to existing controls

Now that Java tests are reporting success/failure, we can use the existing
flaky test controls to drive it. As a refresher, the C++ tests rely on these
environment variables:
- RUN_FLAKY_ONLY: whether to run just flaky tests or all tests
- KUDU_FLAKY_TEST_ATTEMPTS: number of attempts for flaky tests
- KUDU_FLAKY_TEST_LIST: path to list of flaky tests, one on each line
- KUDU_RETRY_ALL_FAILED_TESTS: whether to retry all tests or just the ones
                               in the flaky test list

The algorithm is roughly:
  if RUN_FLAKY_ONLY or KUDU_FLAKY_TEST_ATTEMPTS > 1:
    populate KUDU_FLAKY_TEST_LIST from test result server

  if RUN_FLAKY_ONLY:
    testset = tests listed in KUDU_FLAKY_TEST_LIST
  else:
    testset = all tests

  for t in testset:
    if KUDU_RETRY_ALL_FAILED_TESTS or (KUDU_FLAKY_TEST_LIST and
                                       t in KUDU_FLAKY_TEST_LIST):
      num_attempts = KUDU_FLAKY_TEST_ATTEMPTS (or 1 if unset)
    else:
      num_attempts = 1

    run t up to num_attempts times

You can see it at work in build-and-test.sh/run-test.sh. You can also see it
in dist-test.py though notably, it doesn't care about RUN_FLAKY_ONLY because
we never used that particular combination (presumably the list of flaky
tests is short enough that it wouldn't benefit from distributed testing).

This patch attempts to mirror these exact semantics for Java tests. Here are
the interesting changes:
- In RetryRule, rerunFailingTestsCount is gone. The behavior is informed via
  the aforementioned environment variables instead.
- In build-and-test.sh, if RUN_FLAKY_ONLY is set, parse the flaky test list
  into a series of --tests gradle command line arguments.
- In dist-test.py, opt into the C++ flaky test handling (which reflects the
  above algorithm). There are also some small changes to flaky handling to
  accommodate Java's per-method flaky test tracking.

Note: all of this assumes that there's no overlap between the names of any
C++ or Java tests, which is currently true as all C++ tests have names like
"tablet-test" or "master_cert_authority-itest" while all Java tests are
prefixed with "org.apache.kudu...". If this were to change, we'd need to
properly "namespace" the test results in the reporting infrastructure and
fetch the flaky test lists separately for C++ and Java tests. For now
there's just one flaky test list, and both ctest and gradle are OK with
being asked to run irrelevant tests (they'll just be ignored).

Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
---
M build-support/dist_test.py
M build-support/jenkins/build-and-test.sh
M java/gradle/tests.gradle
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java
4 files changed, 139 insertions(+), 31 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/12917/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................


Patch Set 1: Verified+1

Overriding Jenkins. Several Java tests failed, but I think these are known flakes that were previously retried because we retried all tests three times. One was KUDU-1521 and there may not be JIRAs open for the others.

I expect that once submitted we'll start tracking these failures in the dashboard, and then we'll retry them automatically.


-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 03 Apr 2019 03:32:47 +0000
Gerrit-HasComments: No

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Andrew Wong, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12917

to look at the new patch set (#2).

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................

build: adapt new Java flaky test infrastructure to existing controls

Now that Java tests are reporting success/failure, we can use the existing
flaky test controls to drive it. As a refresher, the C++ tests rely on these
environment variables:
- RUN_FLAKY_ONLY: whether to run just flaky tests or all tests
- KUDU_FLAKY_TEST_ATTEMPTS: number of attempts for flaky tests
- KUDU_FLAKY_TEST_LIST: path to list of flaky tests, one on each line
- KUDU_RETRY_ALL_FAILED_TESTS: whether to retry all tests or just the ones
                               in the flaky test list

The algorithm is roughly:
  if RUN_FLAKY_ONLY or KUDU_FLAKY_TEST_ATTEMPTS > 1:
    populate KUDU_FLAKY_TEST_LIST from test result server

  if RUN_FLAKY_ONLY:
    testset = tests listed in KUDU_FLAKY_TEST_LIST
  else:
    testset = all tests

  for t in testset:
    if KUDU_RETRY_ALL_FAILED_TESTS or (KUDU_FLAKY_TEST_LIST and
                                       t in KUDU_FLAKY_TEST_LIST):
      num_attempts = KUDU_FLAKY_TEST_ATTEMPTS (or 1 if unset)
    else:
      num_attempts = 1

    run t up to num_attempts times

You can see it at work in build-and-test.sh/run-test.sh. You can also see it
in dist-test.py though notably, it doesn't care about RUN_FLAKY_ONLY because
we never used that particular combination (presumably the list of flaky
tests is short enough that it wouldn't benefit from distributed testing).

This patch attempts to mirror these exact semantics for Java tests. Here are
the interesting changes:
- In RetryRule, rerunFailingTestsCount is gone. The behavior is informed via
  the aforementioned environment variables instead.
- In build-and-test.sh, if RUN_FLAKY_ONLY is set, parse the flaky test list
  into a series of --tests gradle command line arguments.
- In dist-test.py, opt into the C++ flaky test handling (which reflects the
  above algorithm). There are also some small changes to flaky handling to
  accommodate Java's per-method flaky test tracking.

Note: all of this assumes that there's no overlap between the names of any
C++ or Java tests, which is currently true as all C++ tests have names like
"tablet-test" or "master_cert_authority-itest" while all Java tests are
prefixed with "org.apache.kudu...". If this were to change, we'd need to
properly "namespace" the test results in the reporting infrastructure and
fetch the flaky test lists separately for C++ and Java tests. For now
there's just one flaky test list, and both ctest and gradle are OK with
being asked to run irrelevant tests (they'll just be ignored).

Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
---
M build-support/dist_test.py
M build-support/jenkins/build-and-test.sh
M java/gradle/tests.gradle
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java
4 files changed, 137 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/12917/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................

build: adapt new Java flaky test infrastructure to existing controls

Now that Java tests are reporting success/failure, we can use the existing
flaky test controls to drive it. As a refresher, the C++ tests rely on these
environment variables:
- RUN_FLAKY_ONLY: whether to run just flaky tests or all tests
- KUDU_FLAKY_TEST_ATTEMPTS: number of attempts for flaky tests
- KUDU_FLAKY_TEST_LIST: path to list of flaky tests, one on each line
- KUDU_RETRY_ALL_FAILED_TESTS: whether to retry all tests or just the ones
                               in the flaky test list

The algorithm is roughly:
  if RUN_FLAKY_ONLY or KUDU_FLAKY_TEST_ATTEMPTS > 1:
    populate KUDU_FLAKY_TEST_LIST from test result server

  if RUN_FLAKY_ONLY:
    testset = tests listed in KUDU_FLAKY_TEST_LIST
  else:
    testset = all tests

  for t in testset:
    if KUDU_RETRY_ALL_FAILED_TESTS or (KUDU_FLAKY_TEST_LIST and
                                       t in KUDU_FLAKY_TEST_LIST):
      num_attempts = KUDU_FLAKY_TEST_ATTEMPTS (or 1 if unset)
    else:
      num_attempts = 1

    run t up to num_attempts times

You can see it at work in build-and-test.sh/run-test.sh. You can also see it
in dist-test.py though notably, it doesn't care about RUN_FLAKY_ONLY because
we never used that particular combination (presumably the list of flaky
tests is short enough that it wouldn't benefit from distributed testing).

This patch attempts to mirror these exact semantics for Java tests. Here are
the interesting changes:
- In RetryRule, rerunFailingTestsCount is gone. The behavior is informed via
  the aforementioned environment variables instead.
- In build-and-test.sh, if RUN_FLAKY_ONLY is set, parse the flaky test list
  into a series of --tests gradle command line arguments.
- In dist-test.py, opt into the C++ flaky test handling (which reflects the
  above algorithm). There are also some small changes to flaky handling to
  accommodate Java's per-method flaky test tracking.

Note: all of this assumes that there's no overlap between the names of any
C++ or Java tests, which is currently true as all C++ tests have names like
"tablet-test" or "master_cert_authority-itest" while all Java tests are
prefixed with "org.apache.kudu...". If this were to change, we'd need to
properly "namespace" the test results in the reporting infrastructure and
fetch the flaky test lists separately for C++ and Java tests. For now
there's just one flaky test list, and both ctest and gradle are OK with
being asked to run irrelevant tests (they'll just be ignored).

Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Reviewed-on: http://gerrit.cloudera.org:8080/12917
Reviewed-by: Grant Henke <gr...@apache.org>
Tested-by: Adar Dembo <ad...@cloudera.com>
---
M build-support/dist_test.py
M build-support/jenkins/build-and-test.sh
M java/gradle/tests.gradle
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java
4 files changed, 145 insertions(+), 31 deletions(-)

Approvals:
  Grant Henke: Looks good to me, approved
  Adar Dembo: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................


Patch Set 1:

Here's what I manually tested:

dist-test + per-test flaky resistance: <enable_dist_test_env_vars> KUDU_FLAKY_TEST_ATTEMPTS=3 TEST_RESULT_SERVER=... build_and_test.sh

report results + all tests flaky resistance: KUDU_REPORT_TEST_RESULTS=1 KUDU_FLAKY_TEST_ATTEMPTS=3 TEST_RESULT_SERVER=... KUDU_RETRY_ALL_FAILED_TESTS=1 build_and_test.sh

report results + per-test flaky resistance + run just the flakes: KUDU_REPORT_TEST_RESULTS=1 KUDU_FLAKY_TEST_ATTEMPTS=3 TEST_RESULT_SERVER=... RUN_FLAKY_ONLY=1 ERROR_ON_TEST_FAILURE=0 build_and_test.sh

Java-only report results: <fine_grained_reporting_env_vars> gradle -PrerunTests test --continue

Java-only report results + per-test flaky resistance: KUDU_FLAKY_TEST_ATTEMPTS=3 TEST_RESULT_SERVER=... KUDU_FLAKY_TEST_LIST=... gradle -PrerunTests test --continue

Java-only report results + all tests flaky resistance: KUDU_FLAKY_TEST_ATTEMPTS=3 TEST_RESULT_SERVER=... KUDU_RETRY_ALL_FAILED_TESTS=1 gradle -PrerunTests test --continue

I think this generally reflects the myriad of combinations that we use, but let me know if I've missed something.


-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 03 Apr 2019 00:59:40 +0000
Gerrit-HasComments: No

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................


Patch Set 3: Verified+1

More unknown Java flakes.


-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 03 Apr 2019 18:59:00 +0000
Gerrit-HasComments: No

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Andrew Wong, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12917

to look at the new patch set (#3).

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................

build: adapt new Java flaky test infrastructure to existing controls

Now that Java tests are reporting success/failure, we can use the existing
flaky test controls to drive it. As a refresher, the C++ tests rely on these
environment variables:
- RUN_FLAKY_ONLY: whether to run just flaky tests or all tests
- KUDU_FLAKY_TEST_ATTEMPTS: number of attempts for flaky tests
- KUDU_FLAKY_TEST_LIST: path to list of flaky tests, one on each line
- KUDU_RETRY_ALL_FAILED_TESTS: whether to retry all tests or just the ones
                               in the flaky test list

The algorithm is roughly:
  if RUN_FLAKY_ONLY or KUDU_FLAKY_TEST_ATTEMPTS > 1:
    populate KUDU_FLAKY_TEST_LIST from test result server

  if RUN_FLAKY_ONLY:
    testset = tests listed in KUDU_FLAKY_TEST_LIST
  else:
    testset = all tests

  for t in testset:
    if KUDU_RETRY_ALL_FAILED_TESTS or (KUDU_FLAKY_TEST_LIST and
                                       t in KUDU_FLAKY_TEST_LIST):
      num_attempts = KUDU_FLAKY_TEST_ATTEMPTS (or 1 if unset)
    else:
      num_attempts = 1

    run t up to num_attempts times

You can see it at work in build-and-test.sh/run-test.sh. You can also see it
in dist-test.py though notably, it doesn't care about RUN_FLAKY_ONLY because
we never used that particular combination (presumably the list of flaky
tests is short enough that it wouldn't benefit from distributed testing).

This patch attempts to mirror these exact semantics for Java tests. Here are
the interesting changes:
- In RetryRule, rerunFailingTestsCount is gone. The behavior is informed via
  the aforementioned environment variables instead.
- In build-and-test.sh, if RUN_FLAKY_ONLY is set, parse the flaky test list
  into a series of --tests gradle command line arguments.
- In dist-test.py, opt into the C++ flaky test handling (which reflects the
  above algorithm). There are also some small changes to flaky handling to
  accommodate Java's per-method flaky test tracking.

Note: all of this assumes that there's no overlap between the names of any
C++ or Java tests, which is currently true as all C++ tests have names like
"tablet-test" or "master_cert_authority-itest" while all Java tests are
prefixed with "org.apache.kudu...". If this were to change, we'd need to
properly "namespace" the test results in the reporting infrastructure and
fetch the flaky test lists separately for C++ and Java tests. For now
there's just one flaky test list, and both ctest and gradle are OK with
being asked to run irrelevant tests (they'll just be ignored).

Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
---
M build-support/dist_test.py
M build-support/jenkins/build-and-test.sh
M java/gradle/tests.gradle
M java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java
4 files changed, 145 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/12917/3
-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................


Patch Set 2: Verified+1

A Java test failed, but as before, I think it was an unknown flake that was previously retried because we retried all tests three times. Going forward we should start accumulating the failures in the flaky test dashboard.


-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 03 Apr 2019 18:25:50 +0000
Gerrit-HasComments: No

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh@246
PS1, Line 246:     # 1. There are no test name collisions between C++ and Java tests.
> Mind a comment here that acknowledges we intend to pass them to gradle? A m
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 03 Apr 2019 18:36:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh@246
PS1, Line 246:       EXTRA_GRADLE_TEST_FLAGS="--tests $t $EXTRA_GRADLE_TEST_FLAGS"
> Not accidentally, but yes. :)
Mind a comment here that acknowledges we intend to pass them to gradle? A mostly copy past of this is okay. I feel like it's something we would second guess later otherwise.



-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 03 Apr 2019 18:30:30 +0000
Gerrit-HasComments: Yes

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/dist_test.py
File build-support/dist_test.py:

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/dist_test.py@369
PS1, Line 369:     test_name = ".".join(k.split(".")[:-1]) if TEST_SHARD_RE.search(k) else k
If I understand right, this doesn't mangle the java names because there is no java class or package that is purely numerical?


http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh@246
PS1, Line 246:       EXTRA_GRADLE_TEST_FLAGS="--tests $t $EXTRA_GRADLE_TEST_FLAGS"
Will this accidentally pass the flaky c++ tests to the Gradle --tests flag?


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java
File java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java:

http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@51
PS1, Line 51:   private static final int MAYBE_RETRY = 0;
Might be more clear to call this NO_EXPLICIT_RETRIES or something. Alternatively, I am not sure you need a constant for the value 0. It would make the `retryCount != MAYBE_RETRY;` line below easier to read to just hard code 0.


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@57
PS1, Line 57:     this(MAYBE_RETRY, /*skipReporting=*/ false);
Should this be DEFAULT_RETRY_COUNT?


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@72
PS1, Line 72:     String value = System.getenv("KUDU_FLAKY_TEST_LIST");
I am not sure how costly these System.getenv calls are, but assuming this KUDU_FLAKY_TEST_LIST could be fairly large, can we populate a map when we initialize the rule and consult it in this method?

We could read/parse KUDU_RETRY_ALL_FAILED_TESTS and KUDU_FLAKY_TEST_ATTEMPTS at construction time too, though the impact is likely lower.


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@128
PS1, Line 128: reporter != null
Do we need this condition? I think below it would always result in `actualRetryCount = 0` in the case one of the others isn't also true.



-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 03 Apr 2019 14:11:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 03 Apr 2019 18:30:39 +0000
Gerrit-HasComments: No

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 03 Apr 2019 18:38:05 +0000
Gerrit-HasComments: No

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12917/4/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java
File java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java:

http://gerrit.cloudera.org:8080/#/c/12917/4/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@61
PS4, Line 61: il
will



-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 03 Apr 2019 21:16:27 +0000
Gerrit-HasComments: Yes

[kudu-CR] build: adapt new Java flaky test infrastructure to existing controls

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12917 )

Change subject: build: adapt new Java flaky test infrastructure to existing controls
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/dist_test.py
File build-support/dist_test.py:

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/dist_test.py@369
PS1, Line 369:     test_name = ".".join(k.split(".")[:-1]) if TEST_SHARD_RE.search(k) else k
> If I understand right, this doesn't mangle the java names because there is 
Correct. And it looks like every class/package identifier needs to have at least one letter[1], so no Java test should ever match this regex.

1. https://stackoverflow.com/a/65490


http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

http://gerrit.cloudera.org:8080/#/c/12917/1/build-support/jenkins/build-and-test.sh@246
PS1, Line 246:       EXTRA_GRADLE_TEST_FLAGS="--tests $t $EXTRA_GRADLE_TEST_FLAGS"
> Will this accidentally pass the flaky c++ tests to the Gradle --tests flag?
Not accidentally, but yes. :)

As I wrote in the commit message, the shortcut that simplified this patch was to _not_ separate the list of C++ and Java flakies. We can get away with it because:
1. There are no collisions between C++/Java test names.
2. Both ctest (with "-R $test_regex") and gradle (with multiple "--tests $t" entries) happily ignore tests they can't find.

A cleaner way would be to enforce the separation, but that means doing one of:
1. Splitting the unified flaky list here into two lists based on some name-based criteria.
2. Adding another flaky list retrieval endpoint to test_result_server.py and modifying the SQL queries to do the same name-based criteria.
3. Like #2 but modifying the reporting to include a bool for C++ or Java test, and changing the results schema accordingly.

I opted for quick-and-dirty.


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java
File java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java:

http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@51
PS1, Line 51:   private static final int MAYBE_RETRY = 0;
> Might be more clear to call this NO_EXPLICIT_RETRIES or something. Alternat
Done


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@57
PS1, Line 57:     this(MAYBE_RETRY, /*skipReporting=*/ false);
> Should this be DEFAULT_RETRY_COUNT?
Done


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@72
PS1, Line 72:     String value = System.getenv("KUDU_FLAKY_TEST_LIST");
> I am not sure how costly these System.getenv calls are, but assuming this K
System.getenv() should be cheap; it's not even a system call [1]. I don't think a map would save us any time, as we only call retryThisTest() once per RetryRule.

Unless you're suggesting a static map created in a static block. I find static blocks somewhat icky, but it would reduce the number of times we read the file, so maybe it is worth doing. 

1. https://stackoverflow.com/questions/46623018/does-a-program-make-a-system-call-to-get-the-value-of-an-environment-variable-in


http://gerrit.cloudera.org:8080/#/c/12917/1/java/kudu-test-utils/src/main/java/org/apache/kudu/test/junit/RetryRule.java@128
PS1, Line 128: reporter != null
> Do we need this condition? I think below it would always result in `actualR
Yes, counter-intuitively it's OK to construct a RetryStatement with actualRetryCount == 0, because that's a test that won't be retried but will report its results. Put another way, reporting and retrying should be independently controllable (see the comment just above).

I don't think we take advantage of this particular combination, but the C++ tests support it, so I figured I'd at least reach parity with them.



-- 
To view, visit http://gerrit.cloudera.org:8080/12917
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia89598d7eeb5ab642ab4ebb7aa583adcce770eae
Gerrit-Change-Number: 12917
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 03 Apr 2019 17:51:52 +0000
Gerrit-HasComments: Yes