You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "daicheng (Jira)" <ji...@apache.org> on 2023/03/21 09:45:00 UTC

[jira] [Resolved] (KUDU-3460) RPC error from VoteRequest()call to peer **:Timed out: RequestConsensusVote RPC to ** time out after 1.713s [SENT]

     [ https://issues.apache.org/jira/browse/KUDU-3460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

daicheng resolved KUDU-3460.
----------------------------
    Fix Version/s: 1.16.0
       Resolution: Not A Problem

> RPC error from VoteRequest()call to peer **:Timed out: RequestConsensusVote RPC to ** time out after 1.713s [SENT]
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: KUDU-3460
>                 URL: https://issues.apache.org/jira/browse/KUDU-3460
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.16.0
>            Reporter: daicheng
>            Priority: Major
>             Fix For: 1.16.0
>
>         Attachments: image-2023-03-17-15-27-45-755.png, image-2023-03-17-15-28-13-480.png, image-2023-03-17-15-28-40-361.png, image-2023-03-17-15-38-51-218.png
>
>
> we hava 3 kudu_master and 6 kudu_tserver,when  i create 2W tables to kudu, wei got some error, and we cann't read any data from kudu,it throw many errors:
> here the errors from client :
> {code:java}
> Job aborted due to stage failure: Task 0 in stage 35.0 failed 4 times, most recent failure: Lost task 0.3 in stage 35.0 (TID 9601) (prod-bigdata-mw-159 executor 3): java.lang.RuntimeException: org.apache.kudu.client.NonRecoverableException: tablet hasn't heard from leader or there hasn't been a stable leader fo..
> 2023-03-08 09:59:49,198 INFO  org.apache.kudu.client.AsyncKuduClient                      [] - Invalidating location master-10.0.2.33:7051(10.0.2.33:7051) for tablet Kudu Master: Service unavailable: ListTables request on kudu.master.MasterService from 10.0.3.82:8764 dropped due to backpressure. The service queue is full; it has 100 items. {code}
> and i found kudu tserver has many error like :
> {code:java}
> W0307 14:36:57.368008 14759 leader_election.cc:334] T fa2a3b405a87466da7a6b1a962f35d99 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 1640 pre-election: RPC error from VoteRequest() call topeer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 2.206s (SENT)W0307 14:36:57.368801 14759 leader_election.cc:334] T 5f8d377660aa46f29e3f1595a33d086c P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 2 pre-election: RPC error from VoteRequest() call to peer dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.725s (SENT)W0307 14:36:57.368917 14759 leader_election.cc:334] T a32af7dd8af44b47b4b26d7a222c2f6b P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 344 pre-election: RPC error from VoteRequest() call to peer dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.713s (SENT)W0307 14:36:57.369045 14759 leader_election.cc:334] T 15e9b550c3274243a5ee923ceda67dc5 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 1509 pre-election: RPC error from VoteRequest() call topeer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 3.056s (SENT)W0307 14:36:57.369563 14759 leader_election.cc:334] T e5e49b443f71478984162a2eb65d3607 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 1575 pre-election: RPC error from VoteRequest() call topeer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 1.553s (SENT)W0307 14:36:57.371872 14759 leader_election.cc:334] T 2ec17c9dd68e47ceb7f572efb9f18fe3 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 1633 pre-election: RPC error from VoteRequest() call topeer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 2.010s (SENT)W0307 14:36:57.372673 14759 leader_election.cc:334] T a91cf24cc4c943cbbd041c7e6726d7aa P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 1610 pre-election: RPC error from VoteRequest() call topeer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 1.970s (SENT)W0307 14:36:57.372789 14759 leader_election.cc:334] T cd667f33abb74afba4b9c510b8f6dfaa P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 3 pre-election: RPC error from VoteRequest() call to peer dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.674s (SENT)W0307 14:36:57.373358 14759 leader_election.cc:334] T 39709b52ffe34f81b08d0562e45a7a13 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 44 pre-election: RPC error from VoteRequest() call to peer d7b4384df45549a891f444d1a1f36a38 (10.0.2.19:7050): Timed out: RequestConsensusVote RPC to 10.0.2.19:7050 timed out after 1.636s (SENT)W0307 14:36:57.373525 14759 leader_election.cc:334] T 00da9e2c20814ac88e18f7d7220f01c9 P 5ac35cfccaf84228bf6d589501ec533e [CANDIDATE]: Term 2 pre-election: RPC error from VoteRequest() call to peer dfff3b43d48a41d5b8f2e5cbb9880454 (10.0.2.21:7050): Timed out: RequestConsensusVote RPC to 10.0.2.21:7050 timed out after 1.524s (SENT) {code}
> and the disk where wal dir located is abnormal
> !image-2023-03-17-15-27-45-755.png|width=314,height=166!!image-2023-03-17-15-28-40-361.png|width=309,height=135!
> here is the wal file look like :
> {code:java}
> schema_version: 0compression_codec: LZ41.1@6873507535186497536 REPLICATE NO_OP        id { term: 1 index: 1 } timestamp: 6873507535186497536 op_type: NO_OP noop_request { }COMMIT 1.1        op_type: NO_OP commited_op_id { term: 1 index: 1 }1.2@6873839930165628928 REPLICATE CHANGE_CONFIG_OP        id { term: 1 index: 2 } timestamp: 6873839930165628928 op_type: CHANGE_CONFIG_OP change_config_record { tablet_id: "68d1c87651f442189f4d6c642b6ea7e6" old_config { opid_index: -1 OBSOLETE_local: false peers { permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: "10.0.2.20" port: 7050 } } } new_config { opid_index: 2 OBSOLETE_local: false peers { permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: "10.0.2.20" port: 7050 } } peers { permanent_uuid: "d7b4384df45549a891f444d1a1f36a38" member_type: NON_VOTER last_known_addr { host: "10.0.2.19" port: 7050 } attrs { promote: true } } } }COMMIT 1.2        op_type: CHANGE_CONFIG_OP commited_op_id { term: 1 index: 2 }1.3@6873841023495979008 REPLICATE CHANGE_CONFIG_OP        id { term: 1 index: 3 } timestamp: 6873841023495979008 op_type: CHANGE_CONFIG_OP change_config_record { tablet_id: "68d1c87651f442189f4d6c642b6ea7e6" old_config { opid_index: 2 OBSOLETE_local: false peers {permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr{ host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: "10.0.2.20" port: 7050 } } peers { permanent_uuid: "d7b4384df45549a891f444d1a1f36a38" member_type: NON_VOTER last_known_addr { host: "10.0.2.19" port: 7050 } attrs { promote: true } } } new_config { opid_index: 3 OBSOLETE_local: false peers { permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: "10.0.2.20" port: 7050 } } peers { permanent_uuid: "d7b4384df45549a891f444d1a1f36a38" member_type: VOTER last_known_addr { host: "10.0.2.19" port: 7050 } attrs { promote: false } } } }COMMIT 1.3        op_type: CHANGE_CONFIG_OP commited_op_id { term: 1 index: 3 }1.4@6873841038243381248 REPLICATE CHANGE_CONFIG_OP        id { term: 1 index: 4 } timestamp: 6873841038243381248 op_type: CHANGE_CONFIG_OP change_config_record { tablet_id: "68d1c87651f442189f4d6c642b6ea7e6" old_config { opid_index: 3 OBSOLETE_local: false peers {permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr{ host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "5ac35cfccaf84228bf6d589501ec533e" member_type: VOTER last_known_addr { host: "10.0.2.20" port: 7050 } } peers { permanent_uuid: "d7b4384df45549a891f444d1a1f36a38" member_type: VOTER last_known_addr { host: "10.0.2.19" port: 7050 } attrs { promote: false } } } new_config { opid_index: 4 OBSOLETE_local: false peers { permanent_uuid: "448ba75af48e4ffdb740f7f8fe244a28" member_type: VOTER last_known_addr { host: "10.0.2.14" port: 7050 } } peers { permanent_uuid: "d88293d7f919446ea14855ac8887a648" member_type: VOTER last_known_addr { host: "10.0.2.15" port: 7050 } } peers { permanent_uuid: "d7b4384df45549a891f444d1a1f36a38" member_type: VOTER last_known_addr { host: "10.0.2.19" port: 7050 } attrs { promote: false } } } }COMMIT 1.4 {code}
> and there are many raft worker theads running,
> !image-2023-03-17-15-38-51-218.png|width=704,height=576!
> it seems like system is busy to handle consensus vote, and i didn't got more helpful error logs in kudu, can anyone explain what happened?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)