You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by al...@apache.org on 2017/10/04 18:31:34 UTC

kudu git commit: [tablet copy] comment on TabletCopyClient lifecycle

Repository: kudu
Updated Branches:
  refs/heads/master 77dea331d -> 3b0ef599b


[tablet copy] comment on TabletCopyClient lifecycle

Added an explanatory comment on the TabletCopyClient instance lifecycle.
That was the result of an attempt to clean-up corresponding code:
  https://gerrit.cloudera.org/#/c/8133/

Change-Id: I7ae41f12fe75f05c84518c26b6875f34efae0172
Reviewed-on: http://gerrit.cloudera.org:8080/8197
Reviewed-by: Mike Percy <mp...@apache.org>
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/3b0ef599
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/3b0ef599
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/3b0ef599

Branch: refs/heads/master
Commit: 3b0ef599b2fc3fc9b895937548e98c57552b021f
Parents: 77dea33
Author: Alexey Serbin <as...@cloudera.com>
Authored: Mon Oct 2 18:24:45 2017 -0700
Committer: Alexey Serbin <as...@cloudera.com>
Committed: Wed Oct 4 18:30:55 2017 +0000

----------------------------------------------------------------------
 src/kudu/tserver/ts_tablet_manager.cc | 33 ++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/3b0ef599/src/kudu/tserver/ts_tablet_manager.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tserver/ts_tablet_manager.cc b/src/kudu/tserver/ts_tablet_manager.cc
index 5b6f9e7..2967fe7 100644
--- a/src/kudu/tserver/ts_tablet_manager.cc
+++ b/src/kudu/tserver/ts_tablet_manager.cc
@@ -626,6 +626,39 @@ void TSTabletManager::RunTabletCopy(
   LOG(INFO) << init_msg;
   TRACE(init_msg);
 
+  // The TabletCopyClient instance should be kept alive until the tablet
+  // is successfully copied over and opened/started. This is because we want
+  // to maintain the LogAnchor until the replica starts up. Upon destruction,
+  // the TabletCopyClient instance sends an RPC explicitly ending the tablet
+  // copy session. The source replica then destroys the corresponding
+  // TabletCopySourceSession object, releasing its LogAnchor and allowing
+  // the WAL segments being copied to be GCed.
+  //
+  // See below for more details on why anchoring of WAL segments is necessary.
+  //
+  // * Assume there are WAL segments 0-10 when tablet copy starts. Tablet copy
+  //   will anchor 0 until its destroyed, meaning the source replica wont
+  //   delete it.
+  //
+  // * When tablet copy is done the tablet still needs to bootstrap which will
+  //   take some time.
+  //
+  // * When tablet bootstrap is done, the new replica will need to continue
+  //   catching up to the leader, this time through consensus. It needs segment
+  //   11 to be still available. We need the anchor to still be alive at this
+  //   point, otherwise there is nothing preventing the leader from deleting
+  //   segment 11 and thus making the new replica unable to catch up. Yes, it's
+  //   not optimal: we're anchoring 0 and we might only need to anchor 10/11.
+  //   However, having no anchor at all is likely to cause replicas to start
+  //   fail copying.
+  //
+  // NOTE:
+  //   Ideally, we should wait until the leader starts tracking of the target
+  //   replica's log watermark. As for current implementation, the intent is
+  //   to at least try preventing GC of logs before the tablet replica connects
+  //   to the leader.
+  //
+  // TODO(aserbin): make this robust and more optimal than it is now.
   TabletCopyClient tc_client(tablet_id, fs_manager_, cmeta_manager_,
                              server_->messenger(), &tablet_copy_metrics_);