You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/12/15 11:02:58 UTC

[jira] [Resolved] (KUDU-1501) RaftConsensusITest.TestMasterReplacesEvictedFollowers flaky with bootstrap reply error

     [ https://issues.apache.org/jira/browse/KUDU-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon resolved KUDU-1501.
-------------------------------
       Resolution: Cannot Reproduce
    Fix Version/s: n/a

I looped this 2500 times on a current build in TSAN and couldn't reproduce the error.

> RaftConsensusITest.TestMasterReplacesEvictedFollowers flaky with bootstrap reply error
> --------------------------------------------------------------------------------------
>
>                 Key: KUDU-1501
>                 URL: https://issues.apache.org/jira/browse/KUDU-1501
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus, tablet
>    Affects Versions: 0.9.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: n/a
>
>
> I looped this test a couple hundred times in TSAN and caught this failure, which looks like it might be a serious bug:
> W0624 01:56:54.399389 30339 consensus_peers.cc:326] T fbfe7e1538d442fb9d4e0958c6ed3b7a P 9ee39d9aaf9840b9a502817a4cfe0a68 -> Peer e9f2622cd2f64f95ad5ebc9706524bab (127.112.159.2:58860): Couldn't send request to peer e9f2622cd2f64f95ad5ebc9706524bab for tablet fbfe7e1538d442fb9d4e0958c6ed3b7a. Error code: TABLET_NOT_RUNNING (12). Status: Illegal state: Tablet not RUNNING: FAILED: Corruption: Failed log replay. Reason: Debug Info: Error playing entry 3 of segment 6 of tablet fbfe7e1538d442fb9d4e0958c6ed3b7a. Segment path: /tmp/kudutest-1000/raft_consensus-itest.RaftConsensusITest.TestMasterReplacesEvictedFollowers.1466733357563157-28831/raft_consensus-itest-cluster/ts-2/wals/fbfe7e1538d442fb9d4e0958c6ed3b7a.recovery/wal-000000006. Entry: type: COMMIT commit { op_type: WRITE_OP commited_op_id { term: 1 index: 36 } result { ops { mutated_stores { mrs_id: 3 } } } }: CommitMsg was orphaned but it referred to stores which need replay. Commit: op_type: WRITE_OP commited_op_id { term: 1 index: 36 } result { ops { mutated_stores { mrs_id: 3 } } }. TabletMetadata: table_id: "9fb52e694c1d46e4991b49b78a3b8acf" tablet_id: "fbfe7e1538d442fb9d4e0958c6ed3b7a" last_durable_mrs_id: 2 rowsets { id: 3 last_durable_dms_id: -1 columns { block { id: 1836738791030108424 } column_id: 10 } columns { block { id: 3331501504918373718 } column_id: 11 } columns { block { id: 4024765891195703834 } column_id: 12 } undo_deltas { block { id: 3564657040239809453 } } bloom_block { id: 3499726779858777197 } } table_name: "TestTable" schema { columns { id: 10 name: "key" type: INT32 is_key: true is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } columns { id: 11 name: "int_val" type: INT32 is_key: false is_nullable: false encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } columns { id: 12 name: "string_val" type: STRING is_key: false is_nullable: true encoding: AUTO_ENCODING compression: DEFAULT_COMPRESSION cfile_block_size: 0 } } schema_version: 0 tablet_data_state: TABLET_DATA_READY partition { partition_key_start: "" partition_key_end: "" } partition_schema { range_schema { columns { id: 10 } } }. Retrying in the next heartbeat period. Already tried 13 times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)