You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2016/03/26 19:32:25 UTC

[jira] [Commented] (KUDU-1319) Tablet bootstrap fails with: Unexpected new leader in same term!

    [ https://issues.apache.org/jira/browse/KUDU-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15213146#comment-15213146 ] 

Todd Lipcon commented on KUDU-1319:
-----------------------------------

[~dralves] did you end up getting a test/repro for this?

> Tablet bootstrap fails with: Unexpected new leader in same term!
> ----------------------------------------------------------------
>
>                 Key: KUDU-1319
>                 URL: https://issues.apache.org/jira/browse/KUDU-1319
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>    Affects Versions: 0.6.0
>            Reporter: David Alves
>            Assignee: David Alves
>            Priority: Critical
>
> Sometimes when a tablet bootstraps in the same tablet server where it was just deleted, after it starts getting messages from the leader it fails with: Unexpected new leader in same term!
> Here's a relevant log:
> {code}
> I0202 14:03:29.673732  2548 raft_consensus.cc:831] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706 [term 18 FOLLOWER]: Refusing update from remote peer
> I0202 14:03:30.939676  2543 tablet_peer.cc:217] Initiating TabletPeer shutdown for tablet: f39f916a60494836942b2e830f569e66
> I0202 14:03:30.939692  2543 maintenance_manager.cc:180] Unregistered op CompactRowSetsOp(f39f916a60494836942b2e830f569e66)
> I0202 14:03:30.939699  2543 maintenance_manager.cc:180] Unregistered op MinorDeltaCompactionOp(f39f916a60494836942b2e830f569e66)
> I0202 14:03:30.939707  2543 maintenance_manager.cc:180] Unregistered op MajorDeltaCompactionOp(f39f916a60494836942b2e830f569e66)
> I0202 14:03:30.939713  2543 maintenance_manager.cc:180] Unregistered op FlushMRSOp(f39f916a60494836942b2e830f569e66)
> I0202 14:03:30.939719  2543 maintenance_manager.cc:180] Unregistered op FlushDeltaMemStoresOp(f39f916a60494836942b2e830f569e66)
> I0202 14:03:30.939724  2543 maintenance_manager.cc:180] Unregistered op LogGCOp(f39f916a60494836942b2e830f569e66)
> I0202 14:03:30.939733  2543 raft_consensus.cc:1467] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706 [term 18 FOLLOWER]: Raft consensus shutting down.
> I0202 14:03:30.939743  2543 raft_consensus.cc:1483] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706 [term 18 FOLLOWER]: Raft consensus is shut down!
> I0202 14:03:30.940188  2543 ts_tablet_manager.cc:941] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Deleting tablet data with delete state TABLET_DATA_TOMBSTONED
> I0202 14:03:30.970777  2543 ts_tablet_manager.cc:951] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Tablet deleted. Last logged OpId: 18.2
> I0202 14:03:30.971076  2543 ts_tablet_manager.cc:411] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Initiating remote bootstrap from Peer ef1f80adfd1849ac933488787009b162 (node-5.internal:7050)
> I0202 14:03:30.971578  2543 remote_bootstrap_client.cc:157] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Remote bootstrap client: Beginning remote bootstrap session from remote peer at address node-5.internal:7050
> I0202 14:03:30.981222  2543 ts_tablet_manager.cc:738] Registered tablet f39f916a60494836942b2e830f569e66
> I0202 14:03:30.981236  2543 remote_bootstrap_client.cc:379] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Remote bootstrap client: Starting download of 0 data blocks...
> I0202 14:03:30.987156  2543 remote_bootstrap_client.cc:344] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Remote bootstrap client: Starting download of 1 WAL segments...
> I0202 14:03:30.987165  2543 tablet_bootstrap.cc:363] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: RemoteBootstrap: Downloading WAL segment with seq. number 1 (1/1)
> I0202 14:03:30.990623  2543 remote_bootstrap_client.cc:265] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Remote bootstrap client: Remote bootstrap complete. Replacing tablet superblock.
> I0202 14:03:30.990633  2543 tablet_bootstrap.cc:363] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: RemoteBootstrap: Replacing tablet superblock
> I0202 14:03:31.001113  5927 ts_tablet_manager.cc:610] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Bootstrapping tablet
> I0202 14:03:31.001261  5927 tablet_bootstrap.cc:363] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Bootstrap starting.
> I0202 14:03:31.001332  5927 tablet_bootstrap.cc:524] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Time spent opening tablet: real 0.000s	user 0.000s	sys 0.000s
> I0202 14:03:31.001406  5927 tablet_bootstrap.cc:580] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Will attempt to recover log segment /var/lib/kudu/tserver/wals/f39f916a60494836942b2e830f569e66/wal-000000001 to /var/lib/kudu/tserver/wals/f39f916a60494836942b2e830f569e66.recovery/wal-000000001
> I0202 14:03:31.001413  5927 tablet_bootstrap.cc:588] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Moving log directory /var/lib/kudu/tserver/wals/f39f916a60494836942b2e830f569e66 to recovery directory /var/lib/kudu/tserver/wals/f39f916a60494836942b2e830f569e66.recovery in preparation for log replay
> W0202 14:03:31.001554  5927 log_util.cc:166] Could not read footer for segment: /var/lib/kudu/tserver/wals/f39f916a60494836942b2e830f569e66.recovery/wal-000000001: Not found: Footer not found. Footer magic doesn't match
> W0202 14:03:31.001605  5927 log_reader.cc:151] Log segment /var/lib/kudu/tserver/wals/f39f916a60494836942b2e830f569e66.recovery/wal-000000001 was likely left in-progress after a previous crash. Will try to rebuild footer by scanning data.
> I0202 14:03:31.001651  5927 log_util.cc:220] Successfully rebuilt footer for segment: /var/lib/kudu/tserver/wals/f39f916a60494836942b2e830f569e66.recovery/wal-000000001 (valid entries through byte offset 353)
> I0202 14:03:31.001674  5927 tablet.cc:752] Rewinding schema during bootstrap to Schema [
> 	0:key1[int64 NOT NULL],
> 	1:key2[int64 NOT NULL],
> 	2:prev1[int64 NULLABLE],
> 	3:prev2[int64 NULLABLE],
> 	4:row_id[int64 NOT NULL],
> 	5:client[string NOT NULL],
> 	6:update_count[int32 NOT NULL]
> ]
> I0202 14:03:31.002318  5927 tablet_bootstrap.cc:363] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Bootstrap replayed 1/1 log segments. Stats: ops{read=2 overwritten=0 applied=2} inserts{seen=0 ignored=0} mutations{seen=0 ignored=0} orphaned_commits=0. Pending: 0 replicates
> I0202 14:03:31.002331  5927 tablet_bootstrap.cc:1003] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: ReplayState: Previous OpId: 18.2, Committed OpId: 18.2, Pending Replicates: 0, Pending Commits: 0
> I0202 14:03:31.002563  5927 tablet_bootstrap.cc:617] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Preparing to delete log recovery files and directory /var/lib/kudu/tserver/wals/f39f916a60494836942b2e830f569e66.recovery
> I0202 14:03:31.002573  5927 tablet_bootstrap.cc:620] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Renaming log recovery dir from /var/lib/kudu/tserver/wals/f39f916a60494836942b2e830f569e66.recovery to /var/lib/kudu/tserver/wals/f39f916a60494836942b2e830f569e66.recovery-1454450611002572
> I0202 14:03:31.002609  5927 tablet_bootstrap.cc:630] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Deleting all files from renamed log recovery directory /var/lib/kudu/tserver/wals/f39f916a60494836942b2e830f569e66.recovery-1454450611002572
> I0202 14:03:31.002704  5927 tablet_bootstrap.cc:633] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Completed deletion of old log recovery files and directory /var/lib/kudu/tserver/wals/f39f916a60494836942b2e830f569e66.recovery-1454450611002572
> I0202 14:03:31.002712  5927 tablet_bootstrap.cc:363] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Bootstrap complete.
> I0202 14:03:31.002768  5927 ts_tablet_manager.cc:615] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706: Time spent bootstrapping tablet: real 0.002s	user 0.000s	sys 0.001s
> I0202 14:03:31.002984  5927 raft_consensus.cc:268] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706 [term 18 FOLLOWER]: Replica starting. Triggering 0 pending transactions. Active config: opid_index: -1 local: false peers { permanent_uuid: "59f9b39fef534f4d8bc2111bbb6bf24c" member_type: VOTER last_known_addr { host: "node-3.internal" port: 7050 } } peers { permanent_uuid: "6ecbb8fb26534a55a009c77617ab0706" member_type: VOTER last_known_addr { host: "node-4.internal" port: 7050 } } peers { permanent_uuid: "ef1f80adfd1849ac933488787009b162" member_type: VOTER last_known_addr { host: "node-5.internal" port: 7050 } }
> I0202 14:03:31.003002  5927 raft_consensus.cc:477] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706 [term 18 FOLLOWER]: Becoming Follower/Learner. State: Replica: 6ecbb8fb26534a55a009c77617ab0706, State: 1, Role: FOLLOWER
> Watermarks: {Received: term: 18 index: 2 Committed: term: 18 index: 2}
> I0202 14:03:31.003013  5927 consensus_queue.cc:162] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706 [NON_LEADER]: Queue going to NON_LEADER mode. State: All replicated op: 0.0, Majority replicated op: 0.0, Committed index: 0.0, Last appended: 18.2, Current term: 0, Majority size: -1, State: 1, Mode: NON_LEADER
> F0202 14:03:34.440295  2538 raft_consensus.cc:922] T f39f916a60494836942b2e830f569e66 P 6ecbb8fb26534a55a009c77617ab0706 [term 18 FOLLOWER]: Unexpected new leader in same term! Existing leader UUID: ef1f80adfd1849ac933488787009b162, new leader UUID: af1f80adfd1849ac933488787009b162
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)