You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Mike Percy (JIRA)" <ji...@apache.org> on 2019/03/07 21:50:00 UTC

[jira] [Created] (KUDU-2740) TabletCopyITest.TestTabletCopyingDeletedTabletFails flaky due to lack of leader election retries

Mike Percy created KUDU-2740:
--------------------------------

             Summary: TabletCopyITest.TestTabletCopyingDeletedTabletFails flaky due to lack of leader election retries
                 Key: KUDU-2740
                 URL: https://issues.apache.org/jira/browse/KUDU-2740
             Project: Kudu
          Issue Type: Bug
    Affects Versions: 1.9.0
            Reporter: Mike Percy


This test can be flaky because we disable failure detection and neglect to retry the leader election. An example error looks like this:
{code:java}
I0307 01:24:56.238428 5333 tablet_service.cc:1239] Received Run Leader Election RPC: tablet_id: "e75f819cfb0a45c483899e2396b3a07a"
dest_uuid: "89a95beac49a43d0b02b662e1a228337"
from {username='slave'} at 127.0.0.1:48832
I0307 01:24:56.238809 5333 raft_consensus.cc:472] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337 [term 0 FOLLOWER]: Starting forced leader election (received explicit request)
I0307 01:24:56.238982 5333 raft_consensus.cc:2886] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337 [term 0 FOLLOWER]: Advancing to term 1
W0307 01:24:56.255393 5500 tablet.cc:1786] T e75f819cfb0a45c483899e2396b3a07a P 7509a1eca3f14f45903715fdb6a20f77: Can't schedule compaction. Clean time has not been advanced past its initial value.
W0307 01:24:56.261915 5377 tablet.cc:1786] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337: Can't schedule compaction. Clean time has not been advanced past its initial value.
W0307 01:24:56.291085 5622 tablet.cc:1786] T e75f819cfb0a45c483899e2396b3a07a P 0caf13c7f5a64af781811ca30ab3656d: Can't schedule compaction. Clean time has not been advanced past its initial value.
W0307 01:24:58.477632 5333 consensus_meta.cc:220] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337: Time spent flushing consensus metadata: real 2.238s user 0.003s sys 0.000s
I0307 01:24:58.477829 5333 raft_consensus.cc:494] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337 [term 1 FOLLOWER]: Starting forced leader election with config: opid_index: -1 OBSOLETE_local: false peers { permanent_uuid: "89a95beac49a43d0b02b662e1a228337" member_type: VOTER last_known_addr { host: "127.4.141.65" port: 44695 } } peers { permanent_uuid: "0caf13c7f5a64af781811ca30ab3656d" member_type: VOTER last_known_addr { host: "127.4.141.67" port: 32845 } } peers { permanent_uuid: "7509a1eca3f14f45903715fdb6a20f77" member_type: VOTER last_known_addr { host: "127.4.141.66" port: 35595 } }
I0307 01:24:58.479737 5333 leader_election.cc:296] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337 [CANDIDATE]: Term 1 election: Requested vote from peers 0caf13c7f5a64af781811ca30ab3656d (127.4.141.67:32845), 7509a1eca3f14f45903715fdb6a20f77 (127.4.141.66:35595)
I0307 01:24:58.480012 5333 rpcz_store.cc:269] Call kudu.consensus.ConsensusService.RunLeaderElection from 127.0.0.1:48832 (request call id 3) took 2241ms. Request Metrics: {"dns_us":93}
I0307 01:24:58.487798 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:24:58.493844 5578 tablet_service.cc:1122] Received RequestConsensusVote() RPC: tablet_id: "e75f819cfb0a45c483899e2396b3a07a" candidate_uuid: "89a95beac49a43d0b02b662e1a228337" candidate_term: 1 candidate_status { last_received { term: 0 index: 0 } } ignore_live_leader: true dest_uuid: "0caf13c7f5a64af781811ca30ab3656d"
I0307 01:24:58.494168 5578 raft_consensus.cc:2886] T e75f819cfb0a45c483899e2396b3a07a P 0caf13c7f5a64af781811ca30ab3656d [term 0 FOLLOWER]: Advancing to term 1
I0307 01:24:58.494354 5456 tablet_service.cc:1122] Received RequestConsensusVote() RPC: tablet_id: "e75f819cfb0a45c483899e2396b3a07a" candidate_uuid: "89a95beac49a43d0b02b662e1a228337" candidate_term: 1 candidate_status { last_received { term: 0 index: 0 } } ignore_live_leader: true dest_uuid: "7509a1eca3f14f45903715fdb6a20f77"
I0307 01:24:58.494655 5456 raft_consensus.cc:2886] T e75f819cfb0a45c483899e2396b3a07a P 7509a1eca3f14f45903715fdb6a20f77 [term 0 FOLLOWER]: Advancing to term 1
W0307 01:24:59.988574 5267 leader_election.cc:341] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337 [CANDIDATE]: Term 1 election: RPC error from VoteRequest() call to peer 7509a1eca3f14f45903715fdb6a20f77 (127.4.141.66:35595): Timed out: RequestConsensusVote RPC to 127.4.141.66:35595 timed out after 1.507s (SENT)
W0307 01:24:59.988920 5266 leader_election.cc:341] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337 [CANDIDATE]: Term 1 election: RPC error from VoteRequest() call to peer 0caf13c7f5a64af781811ca30ab3656d (127.4.141.67:32845): Timed out: RequestConsensusVote RPC to 127.4.141.67:32845 timed out after 1.507s (SENT)
I0307 01:24:59.989068 5266 leader_election.cc:310] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337 [CANDIDATE]: Term 1 election: Election decided. Result: candidate lost. Election summary: received 3 responses out of 3 voters: 1 yes votes; 2 no votes. yes voters: 89a95beac49a43d0b02b662e1a228337; no voters: 0caf13c7f5a64af781811ca30ab3656d, 7509a1eca3f14f45903715fdb6a20f77
I0307 01:24:59.990028 5672 raft_consensus.cc:2588] T e75f819cfb0a45c483899e2396b3a07a P 89a95beac49a43d0b02b662e1a228337 [term 1 FOLLOWER]: Leader election lost for term 1. Reason: could not achieve majority
W0307 01:25:01.850855 5578 consensus_meta.cc:220] T e75f819cfb0a45c483899e2396b3a07a P 0caf13c7f5a64af781811ca30ab3656d: Time spent flushing consensus metadata: real 3.356s user 0.003s sys 0.000s
W0307 01:25:01.850855 5456 consensus_meta.cc:220] T e75f819cfb0a45c483899e2396b3a07a P 7509a1eca3f14f45903715fdb6a20f77: Time spent flushing consensus metadata: real 3.356s user 0.003s sys 0.000s
I0307 01:25:01.851020 5578 raft_consensus.cc:2346] T e75f819cfb0a45c483899e2396b3a07a P 0caf13c7f5a64af781811ca30ab3656d [term 1 FOLLOWER]: Leader election vote request: Granting yes vote for candidate 89a95beac49a43d0b02b662e1a228337 in term 1.
I0307 01:25:01.851022 5456 raft_consensus.cc:2346] T e75f819cfb0a45c483899e2396b3a07a P 7509a1eca3f14f45903715fdb6a20f77 [term 1 FOLLOWER]: Leader election vote request: Granting yes vote for candidate 89a95beac49a43d0b02b662e1a228337 in term 1.
W0307 01:25:01.851320 5578 rpcz_store.cc:253] Call kudu.consensus.ConsensusService.RequestConsensusVote from 127.4.141.65:60535 (request call id 1) took 3357 ms (3.36 s). Client timeout 1507 ms (1.51 s)
W0307 01:25:01.851326 5456 rpcz_store.cc:253] Call kudu.consensus.ConsensusService.RequestConsensusVote from 127.4.141.65:40719 (request call id 1) took 3357 ms (3.36 s). Client timeout 1507 ms (1.51 s)
W0307 01:25:01.852047 5578 rpcz_store.cc:259] Trace:
0307 01:24:58.493384 (+ 0us) service_pool.cc:162] Inserting onto call queue
0307 01:24:58.493470 (+ 86us) service_pool.cc:221] Handling call
0307 01:25:01.851251 (+3357781us) inbound_call.cc:162] Queueing success response
Metrics: {}
W0307 01:25:01.852052 5456 rpcz_store.cc:259] Trace:
0307 01:24:58.493945 (+ 0us) service_pool.cc:162] Inserting onto call queue
0307 01:24:58.494010 (+ 65us) service_pool.cc:221] Handling call
0307 01:25:01.851251 (+3357241us) inbound_call.cc:162] Queueing success response
Metrics: {}
I0307 01:25:01.851759 5455 rpcz_store.cc:269] Call kudu.consensus.ConsensusService.GetLastOpId from 127.0.0.1:54392 (request call id 3) took 3261ms. Request Metrics: {"spinlock_wait_cycles":7499205888}
I0307 01:25:01.853729 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:02.057730 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:02.362155 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:02.766373 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:03.270622 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:03.875089 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:04.579949 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:05.384346 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:06.288928 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:07.293898 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:08.298326 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:09.302745 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:10.307309 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:11.311887 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:12.316773 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:13.321633 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:14.327412 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:15.333164 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:16.337683 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:17.342846 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:18.347681 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:19.351661 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:20.355283 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:21.359144 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:22.363147 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:23.366828 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:24.371155 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:25.375557 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:26.379287 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:27.383133 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
I0307 01:25:28.386693 4661 cluster_itest_util.cc:249] Not converged past 1 yet: 0.0 0.0 0.0
/data/somelongdirectorytoavoidrpathissues/src/kudu/src/kudu/integration-tests/tablet_copy-itest.cc:1019: Failure
Failed
Bad status: Timed out: index 1 not available on all replicas after 30.000s
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)