You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2019/12/05 07:05:00 UTC

[jira] [Commented] (KUDU-2610) TestSimultaneousLeaderTransferAndAbruptStepdown is Flaky

    [ https://issues.apache.org/jira/browse/KUDU-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988537#comment-16988537 ] 

ASF subversion and git services commented on KUDU-2610:
-------------------------------------------------------

Commit 4ab2e7564619b80555a8ddc60a9162cffe75084e in kudu's branch refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=4ab2e75 ]

[test] KUDU-2610 fix flakiness in kudu-admin-test scenario

I saw AdminCliTest.TestSimultaneousLeaderTransferAndAbruptStepdown
failing with timed-out error trying to write data to the test table:

  http://dist-test.cloudera.org/job?job_id=aserbin.1575418627.126650

A log from another occurrence of this issue is attached to
https://issues.apache.org/jira/browse/KUDU-2610

It seems the frequency of the leader change/step-down requests might be
safely decreased to make this test more stable.  The test already have
the frequency of the leader change requests clamped down for ASAN
builds compared with the RELEASE/DEBUG/TSAN case.

Change-Id: I98e792783efa2909d10174f84ddd785f5a968046
Reviewed-on: http://gerrit.cloudera.org:8080/14827
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Kudu Jenkins


> TestSimultaneousLeaderTransferAndAbruptStepdown is Flaky
> --------------------------------------------------------
>
>                 Key: KUDU-2610
>                 URL: https://issues.apache.org/jira/browse/KUDU-2610
>             Project: Kudu
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Hao Hao
>            Assignee: William Berkeley
>            Priority: Major
>         Attachments: kudu-admin-test.5.txt
>
>
> AdminCliTest.TestSimultaneousLeaderTransferAndAbruptStepdown is flaky sometime in ASAN build with the following error:
> {noformat}
> b01d528fd3c74eb5b42b8d4888591ed2 (127.18.62.194:38185) has failed: Timed out: Write RPC to 127.18.62.194:38185 timed out after 60.000s (SENT)
> W1017 23:33:47.772014 20038 batcher.cc:348] Timed out: Failed to write batch of 1 ops to tablet 9b4b2dea960941bcb38197b51c55baf4 after 1 attempt(s): Failed to write to server: b01d528fd3c74eb5b42b8d4888591ed2 (127.18.62.194:38185): Write RPC to 127.18.62.194:38185 timed out after 60.000s (SENT)
> F1017 23:33:47.772820 20042 test_workload.cc:202] Timed out: Failed to write batch of 1 ops to tablet 9b4b2dea960941bcb38197b51c55baf4 after 1 attempt(s): Failed to write to server: b01d528fd3c74eb5b42b8d4888591ed2 (127.18.62.194:38185): Write RPC to 127.18.62.194:38185 timed out after 60.000s (SENT)
> *** Check failure stack trace: ***
> *** Aborted at 1539844427 (unix time) try "date -d @1539844427" if you are using GNU date ***
> PC: @ 0x3c74632625 __GI_raise
> *** SIGABRT (@0x452000048fb) received by PID 18683 (TID 0x7f13ebe5b700) from PID 18683; stack trace: ***
>  @ 0x3c74a0f710 (unknown) at ??:0
>  @ 0x3c74632625 __GI_raise at ??:0
>  @ 0x3c74633e05 __GI_abort at ??:0
>  @ 0x7f13fd43da29 (unknown) at ??:0
>  @ 0x7f13fd43f31d (unknown) at ??:0
>  @ 0x7f13fd4411dd (unknown) at ??:0
>  @ 0x7f13fd43ee59 (unknown) at ??:0
>  @ 0x7f13fd441c7f (unknown) at ??:0
>  @ 0x7f1412f7ba6e (unknown) at ??:0
>  @ 0x3c796b6470 (unknown) at ??:0
>  @ 0x3c74a079d1 start_thread at ??:0
>  @ 0x3c746e88fd clone at ??:0
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)