You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (JIRA)" <ji...@apache.org> on 2017/05/18 22:43:04 UTC
[jira] [Comment Edited] (KUDU-1034) Client does not fail over due
to timeout
[ https://issues.apache.org/jira/browse/KUDU-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016582#comment-16016582 ]
Alexey Serbin edited comment on KUDU-1034 at 5/18/17 10:42 PM:
---------------------------------------------------------------
Running the new test implemented in the patch from [~mpercy] (client_timeout_fail.patch), the current Kudu C++ client apparently retries but eventually the test fails due to the consistency check (that's true both for DEBUG and RELEASE configurations):
{noformat}
W0518 15:41:00.559075 6311 consensus_peers.cc:357] T fb88746eb1674bbaacbcf459dd492669 P 623bba78cbb6472dbd2c8779cf51c93d -> Peer 16df2bf0fb074c46959c06f6f2069150 (127.24.81.2:43726): Couldn't send request to peer 16df2bf0fb074c46959c06f6f2069150 for tablet fb88746eb1674bbaacbcf459dd492669. Status: Timed out: UpdateConsensus RPC to 127.24.81.2:43726 timed out after 0.050s (ON_OUTBOUND_QUEUE). Retrying in the next heartbeat period. Already tried 20 times.
W0518 15:41:00.638219 6697 batcher.cc:329] Timed out: Failed to write batch of 50 ops to tablet fb88746eb1674bbaacbcf459dd492669 after 1 attempt(s): Failed to write to server: 16df2bf0fb074c46959c06f6f2069150 (127.24.81.2:43726): Write RPC to 127.24.81.2:43726 timed out after 0.500s (SENT)
W0518 15:41:01.059166 6311 consensus_peers.cc:357] T fb88746eb1674bbaacbcf459dd492669 P 623bba78cbb6472dbd2c8779cf51c93d -> Peer 16df2bf0fb074c46959c06f6f2069150 (127.24.81.2:43726): Couldn't send request to peer 16df2bf0fb074c46959c06f6f2069150 for tablet fb88746eb1674bbaacbcf459dd492669. Status: Timed out: UpdateConsensus RPC to 127.24.81.2:43726 timed out after 0.050s (ON_OUTBOUND_QUEUE). Retrying in the next heartbeat period. Already tried 21 times.
F0518 15:41:01.216828 6225 raft_consensus-itest.cc:454] Check failed: workload.rows_inserted() >= rows_target (1450 vs. 1550)
*** Check failure stack trace: ***
@ 0x8a59b5 google::LogMessage::SendToLog()
@ 0x8a5e9f google::LogMessage::Flush()
@ 0x8a99f2 google::LogMessageFatal::~LogMessageFatal()
@ 0x803fb8 kudu::tserver::RaftConsensusITest_TestClientFailoverOnLeaderTimeout_Test::TestBody()
@ 0x1894aa7 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x1879022 testing::Test::Run()
@ 0x187a284 testing::TestInfo::Run()
@ 0x187aa33 testing::TestCase::Run()
@ 0x1883539 testing::internal::UnitTestImpl::RunAllTests()
@ 0x1895683 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x18830ea testing::UnitTest::Run()
@ 0x8a1dc9 main
@ 0x3ae0a1ed5d (unknown)
@ 0x801b41 (unknown)
Aborted
{noformat}
was (Author: aserbin):
Running the new test implemented in the patch from [~mpercy] (client_timeout_fail.patch), the current Kudu C++ client apparently retries but eventually test fails due to consistency check (that's true both for DEBUG and RELEASE configurations):
{noformat}
W0518 15:41:00.559075 6311 consensus_peers.cc:357] T fb88746eb1674bbaacbcf459dd492669 P 623bba78cbb6472dbd2c8779cf51c93d -> Peer 16df2bf0fb074c46959c06f6f2069150 (127.24.81.2:43726): Couldn't send request to peer 16df2bf0fb074c46959c06f6f2069150 for tablet fb88746eb1674bbaacbcf459dd492669. Status: Timed out: UpdateConsensus RPC to 127.24.81.2:43726 timed out after 0.050s (ON_OUTBOUND_QUEUE). Retrying in the next heartbeat period. Already tried 20 times.
W0518 15:41:00.638219 6697 batcher.cc:329] Timed out: Failed to write batch of 50 ops to tablet fb88746eb1674bbaacbcf459dd492669 after 1 attempt(s): Failed to write to server: 16df2bf0fb074c46959c06f6f2069150 (127.24.81.2:43726): Write RPC to 127.24.81.2:43726 timed out after 0.500s (SENT)
W0518 15:41:01.059166 6311 consensus_peers.cc:357] T fb88746eb1674bbaacbcf459dd492669 P 623bba78cbb6472dbd2c8779cf51c93d -> Peer 16df2bf0fb074c46959c06f6f2069150 (127.24.81.2:43726): Couldn't send request to peer 16df2bf0fb074c46959c06f6f2069150 for tablet fb88746eb1674bbaacbcf459dd492669. Status: Timed out: UpdateConsensus RPC to 127.24.81.2:43726 timed out after 0.050s (ON_OUTBOUND_QUEUE). Retrying in the next heartbeat period. Already tried 21 times.
F0518 15:41:01.216828 6225 raft_consensus-itest.cc:454] Check failed: workload.rows_inserted() >= rows_target (1450 vs. 1550)
*** Check failure stack trace: ***
@ 0x8a59b5 google::LogMessage::SendToLog()
@ 0x8a5e9f google::LogMessage::Flush()
@ 0x8a99f2 google::LogMessageFatal::~LogMessageFatal()
@ 0x803fb8 kudu::tserver::RaftConsensusITest_TestClientFailoverOnLeaderTimeout_Test::TestBody()
@ 0x1894aa7 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x1879022 testing::Test::Run()
@ 0x187a284 testing::TestInfo::Run()
@ 0x187aa33 testing::TestCase::Run()
@ 0x1883539 testing::internal::UnitTestImpl::RunAllTests()
@ 0x1895683 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0x18830ea testing::UnitTest::Run()
@ 0x8a1dc9 main
@ 0x3ae0a1ed5d (unknown)
@ 0x801b41 (unknown)
Aborted
{noformat}
> Client does not fail over due to timeout
> ----------------------------------------
>
> Key: KUDU-1034
> URL: https://issues.apache.org/jira/browse/KUDU-1034
> Project: Kudu
> Issue Type: Bug
> Components: client
> Affects Versions: Feature Complete
> Reporter: Mike Percy
> Assignee: Alexey Serbin
> Priority: Critical
> Attachments: client_timeout_fail.patch, client_timeout_flush_hang.patch
>
>
> The client will not fail over due to a timeout error. Attaching a failing test case.
> I just made the test case part of RaftConsensusITest because it was convenient, maybe it should go elsewhere.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)