You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (JIRA)" <ji...@apache.org> on 2017/05/18 22:43:04 UTC

[jira] [Comment Edited] (KUDU-1034) Client does not fail over due to timeout

    [ https://issues.apache.org/jira/browse/KUDU-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16016582#comment-16016582 ] 

Alexey Serbin edited comment on KUDU-1034 at 5/18/17 10:42 PM:
---------------------------------------------------------------

Running the new test implemented in the patch from [~mpercy] (client_timeout_fail.patch), the current Kudu C++ client apparently retries but eventually the test fails due to the consistency check (that's true both for DEBUG and RELEASE configurations):

{noformat}
W0518 15:41:00.559075  6311 consensus_peers.cc:357] T fb88746eb1674bbaacbcf459dd492669 P 623bba78cbb6472dbd2c8779cf51c93d -> Peer 16df2bf0fb074c46959c06f6f2069150 (127.24.81.2:43726): Couldn't send request to peer 16df2bf0fb074c46959c06f6f2069150 for tablet fb88746eb1674bbaacbcf459dd492669. Status: Timed out: UpdateConsensus RPC to 127.24.81.2:43726 timed out after 0.050s (ON_OUTBOUND_QUEUE). Retrying in the next heartbeat period. Already tried 20 times.
W0518 15:41:00.638219  6697 batcher.cc:329] Timed out: Failed to write batch of 50 ops to tablet fb88746eb1674bbaacbcf459dd492669 after 1 attempt(s): Failed to write to server: 16df2bf0fb074c46959c06f6f2069150 (127.24.81.2:43726): Write RPC to 127.24.81.2:43726 timed out after 0.500s (SENT)
W0518 15:41:01.059166  6311 consensus_peers.cc:357] T fb88746eb1674bbaacbcf459dd492669 P 623bba78cbb6472dbd2c8779cf51c93d -> Peer 16df2bf0fb074c46959c06f6f2069150 (127.24.81.2:43726): Couldn't send request to peer 16df2bf0fb074c46959c06f6f2069150 for tablet fb88746eb1674bbaacbcf459dd492669. Status: Timed out: UpdateConsensus RPC to 127.24.81.2:43726 timed out after 0.050s (ON_OUTBOUND_QUEUE). Retrying in the next heartbeat period. Already tried 21 times.
F0518 15:41:01.216828  6225 raft_consensus-itest.cc:454] Check failed: workload.rows_inserted() >= rows_target (1450 vs. 1550) 
*** Check failure stack trace: ***
    @           0x8a59b5  google::LogMessage::SendToLog()
    @           0x8a5e9f  google::LogMessage::Flush()
    @           0x8a99f2  google::LogMessageFatal::~LogMessageFatal()
    @           0x803fb8  kudu::tserver::RaftConsensusITest_TestClientFailoverOnLeaderTimeout_Test::TestBody()
    @          0x1894aa7  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @          0x1879022  testing::Test::Run()
    @          0x187a284  testing::TestInfo::Run()
    @          0x187aa33  testing::TestCase::Run()
    @          0x1883539  testing::internal::UnitTestImpl::RunAllTests()
    @          0x1895683  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @          0x18830ea  testing::UnitTest::Run()
    @           0x8a1dc9  main
    @       0x3ae0a1ed5d  (unknown)
    @           0x801b41  (unknown)
Aborted
{noformat}




was (Author: aserbin):
Running the new test implemented in the patch from [~mpercy] (client_timeout_fail.patch), the current Kudu C++ client apparently retries but eventually test fails due to consistency check (that's true both for DEBUG and RELEASE configurations):

{noformat}
W0518 15:41:00.559075  6311 consensus_peers.cc:357] T fb88746eb1674bbaacbcf459dd492669 P 623bba78cbb6472dbd2c8779cf51c93d -> Peer 16df2bf0fb074c46959c06f6f2069150 (127.24.81.2:43726): Couldn't send request to peer 16df2bf0fb074c46959c06f6f2069150 for tablet fb88746eb1674bbaacbcf459dd492669. Status: Timed out: UpdateConsensus RPC to 127.24.81.2:43726 timed out after 0.050s (ON_OUTBOUND_QUEUE). Retrying in the next heartbeat period. Already tried 20 times.
W0518 15:41:00.638219  6697 batcher.cc:329] Timed out: Failed to write batch of 50 ops to tablet fb88746eb1674bbaacbcf459dd492669 after 1 attempt(s): Failed to write to server: 16df2bf0fb074c46959c06f6f2069150 (127.24.81.2:43726): Write RPC to 127.24.81.2:43726 timed out after 0.500s (SENT)
W0518 15:41:01.059166  6311 consensus_peers.cc:357] T fb88746eb1674bbaacbcf459dd492669 P 623bba78cbb6472dbd2c8779cf51c93d -> Peer 16df2bf0fb074c46959c06f6f2069150 (127.24.81.2:43726): Couldn't send request to peer 16df2bf0fb074c46959c06f6f2069150 for tablet fb88746eb1674bbaacbcf459dd492669. Status: Timed out: UpdateConsensus RPC to 127.24.81.2:43726 timed out after 0.050s (ON_OUTBOUND_QUEUE). Retrying in the next heartbeat period. Already tried 21 times.
F0518 15:41:01.216828  6225 raft_consensus-itest.cc:454] Check failed: workload.rows_inserted() >= rows_target (1450 vs. 1550) 
*** Check failure stack trace: ***
    @           0x8a59b5  google::LogMessage::SendToLog()
    @           0x8a5e9f  google::LogMessage::Flush()
    @           0x8a99f2  google::LogMessageFatal::~LogMessageFatal()
    @           0x803fb8  kudu::tserver::RaftConsensusITest_TestClientFailoverOnLeaderTimeout_Test::TestBody()
    @          0x1894aa7  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @          0x1879022  testing::Test::Run()
    @          0x187a284  testing::TestInfo::Run()
    @          0x187aa33  testing::TestCase::Run()
    @          0x1883539  testing::internal::UnitTestImpl::RunAllTests()
    @          0x1895683  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @          0x18830ea  testing::UnitTest::Run()
    @           0x8a1dc9  main
    @       0x3ae0a1ed5d  (unknown)
    @           0x801b41  (unknown)
Aborted
{noformat}



> Client does not fail over due to timeout
> ----------------------------------------
>
>                 Key: KUDU-1034
>                 URL: https://issues.apache.org/jira/browse/KUDU-1034
>             Project: Kudu
>          Issue Type: Bug
>          Components: client
>    Affects Versions: Feature Complete
>            Reporter: Mike Percy
>            Assignee: Alexey Serbin
>            Priority: Critical
>         Attachments: client_timeout_fail.patch, client_timeout_flush_hang.patch
>
>
> The client will not fail over due to a timeout error. Attaching a failing test case.
> I just made the test case part of RaftConsensusITest because it was convenient, maybe it should go elsewhere.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)