You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2020/10/02 20:17:22 UTC

[kudu-CR] [tests] fix flake in TestTabletServerProxyCallErrors

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16539


Change subject: [tests] fix flake in TestTabletServerProxyCallErrors
......................................................................

[tests] fix flake in TestTabletServerProxyCallErrors

This patch fixes a flake in the TestTabletServerProxyCallErrors
scenario of the TxnStatusTabletManagementTest. Before this patch,
it was failing in about 1 out of 16 runs when running with
--stress_cpu_threads=16 (DEBUG build) with errors like below:

  src/kudu/integration-tests/ts_tablet_manager-itest.cc:1325: Failure
  Value of: s.IsInvalidArgument()
    Actual: false
  Expected: true
  Illegal state: Tablet not RUNNING: BOOTSTRAPPING
  Google Test trace:
  src/kudu/integration-tests/ts_tablet_manager-itest.cc:1322: error {
    code: TABLET_NOT_RUNNING
    status {
      code: ILLEGAL_STATE
      message: "Tablet not RUNNING: BOOTSTRAPPING"
    }
  }

After this patch, none of 64 runs of the scenario failed when running
with the same flags.

Change-Id: I64b9f1eeb6bacf684a15ef84cddacadb43ac43fe
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
1 file changed, 15 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/16539/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16539
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I64b9f1eeb6bacf684a15ef84cddacadb43ac43fe
Gerrit-Change-Number: 16539
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [tests] fix flake in TestTabletServerProxyCallErrors

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16539 )

Change subject: [tests] fix flake in TestTabletServerProxyCallErrors
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16539
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64b9f1eeb6bacf684a15ef84cddacadb43ac43fe
Gerrit-Change-Number: 16539
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 03 Oct 2020 03:46:57 +0000
Gerrit-HasComments: No

[kudu-CR] [tests] fix flake in TestTabletServerProxyCallErrors

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16539 )

Change subject: [tests] fix flake in TestTabletServerProxyCallErrors
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16539/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc
File src/kudu/integration-tests/ts_tablet_manager-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16539/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@1075
PS1, Line 1075:     // Wait for the tablet to be in RUNNING state.
              :     const auto deadline = MonoTime::Now() + kTimeout;
              :     bool is_tablet_running = false;
              :     do {
              :       if (r->CheckRunning().ok()) {
              :         is_tablet_running = true;
              :         break;
              :       }
              :       SleepFor(MonoDelta::FromMilliseconds(10));
              :     } while (MonoTime::Now() < deadline);
              :     if (!is_tablet_running) {
              :       Status::TimedOut("timed out waiting for txn status tablet running");
              :     }
nit: can we use r->WaitUntilConsensusRunning() instead?



-- 
To view, visit http://gerrit.cloudera.org:8080/16539
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64b9f1eeb6bacf684a15ef84cddacadb43ac43fe
Gerrit-Change-Number: 16539
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 03 Oct 2020 01:07:17 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tests] fix flake in TestTabletServerProxyCallErrors

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16539

to look at the new patch set (#2).

Change subject: [tests] fix flake in TestTabletServerProxyCallErrors
......................................................................

[tests] fix flake in TestTabletServerProxyCallErrors

This patch fixes a flake in the TestTabletServerProxyCallErrors
scenario of the TxnStatusTabletManagementTest. Before this patch,
it was failing in about 1 out of 16 runs when running with
--stress_cpu_threads=16 (DEBUG build) with errors like below:

  src/kudu/integration-tests/ts_tablet_manager-itest.cc:1325: Failure
  Value of: s.IsInvalidArgument()
    Actual: false
  Expected: true
  Illegal state: Tablet not RUNNING: BOOTSTRAPPING
  Google Test trace:
  src/kudu/integration-tests/ts_tablet_manager-itest.cc:1322: error {
    code: TABLET_NOT_RUNNING
    status {
      code: ILLEGAL_STATE
      message: "Tablet not RUNNING: BOOTSTRAPPING"
    }
  }

After this patch, none of 300+ runs of the scenario failed when running
with the same flags.

Change-Id: I64b9f1eeb6bacf684a15ef84cddacadb43ac43fe
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
1 file changed, 3 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/16539/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16539
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I64b9f1eeb6bacf684a15ef84cddacadb43ac43fe
Gerrit-Change-Number: 16539
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [tests] fix flake in TestTabletServerProxyCallErrors

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16539 )

Change subject: [tests] fix flake in TestTabletServerProxyCallErrors
......................................................................

[tests] fix flake in TestTabletServerProxyCallErrors

This patch fixes a flake in the TestTabletServerProxyCallErrors
scenario of the TxnStatusTabletManagementTest. Before this patch,
it was failing in about 1 out of 16 runs when running with
--stress_cpu_threads=16 (DEBUG build) with errors like below:

  src/kudu/integration-tests/ts_tablet_manager-itest.cc:1325: Failure
  Value of: s.IsInvalidArgument()
    Actual: false
  Expected: true
  Illegal state: Tablet not RUNNING: BOOTSTRAPPING
  Google Test trace:
  src/kudu/integration-tests/ts_tablet_manager-itest.cc:1322: error {
    code: TABLET_NOT_RUNNING
    status {
      code: ILLEGAL_STATE
      message: "Tablet not RUNNING: BOOTSTRAPPING"
    }
  }

After this patch, none of 300+ runs of the scenario failed when running
with the same flags.

Change-Id: I64b9f1eeb6bacf684a15ef84cddacadb43ac43fe
Reviewed-on: http://gerrit.cloudera.org:8080/16539
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Andrew Wong <aw...@cloudera.com>
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
1 file changed, 3 insertions(+), 0 deletions(-)

Approvals:
  Alexey Serbin: Verified
  Andrew Wong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/16539
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I64b9f1eeb6bacf684a15ef84cddacadb43ac43fe
Gerrit-Change-Number: 16539
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [tests] fix flake in TestTabletServerProxyCallErrors

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16539 )

Change subject: [tests] fix flake in TestTabletServerProxyCallErrors
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16539/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc
File src/kudu/integration-tests/ts_tablet_manager-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16539/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@1075
PS1, Line 1075:     // Wait for the tablet to be in RUNNING state and its consensus running too.
              :     RETURN_NOT_OK(r->WaitUntilConsensusRunning(kTimeout));
              :     return r->consensus()->WaitUntilLeaderForTests(kTimeout);
              :   }
              : 
              :   // Creates a transaction status tablet at the given tablet server.
              :   Status CreateTxnStatusTablet(MiniTabletServer* ts) {
              :     return CreateTablet(ts, kTxnStatusTabletId, /*is_txn_status_tablet*/true);
              :   }
              : 
              :   static Status StartTransactions(const ParticipantIdsByTxnId& txns, TxnCoordinator* coordinator) {
              :     TabletServerErrorPB ts_error;
              :     f
> nit: can we use r->WaitUntilConsensusRunning() instead?
Good point -- it seems we can.  For some reason, I didn't see that WaitUntilConsensusRunning() existed.



-- 
To view, visit http://gerrit.cloudera.org:8080/16539
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64b9f1eeb6bacf684a15ef84cddacadb43ac43fe
Gerrit-Change-Number: 16539
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 03 Oct 2020 02:45:17 +0000
Gerrit-HasComments: Yes

[kudu-CR] [tests] fix flake in TestTabletServerProxyCallErrors

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: [tests] fix flake in TestTabletServerProxyCallErrors
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/16539
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I64b9f1eeb6bacf684a15ef84cddacadb43ac43fe
Gerrit-Change-Number: 16539
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [tests] fix flake in TestTabletServerProxyCallErrors

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16539 )

Change subject: [tests] fix flake in TestTabletServerProxyCallErrors
......................................................................


Patch Set 2: Verified+1

unrelated test failure (TSAN): https://issues.apache.org/jira/browse/KUDU-2942


-- 
To view, visit http://gerrit.cloudera.org:8080/16539
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64b9f1eeb6bacf684a15ef84cddacadb43ac43fe
Gerrit-Change-Number: 16539
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 03 Oct 2020 03:46:04 +0000
Gerrit-HasComments: No