You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by jd...@apache.org on 2017/04/04 21:56:06 UTC

[4/4] kudu git commit: docs: Add documentation for background tasks

docs: Add documentation for background tasks

This patch adds a documentation page for background tasks. My primary
motivation was to document tablet history GC but it seemed that we had a
hole in our documentation concerning the maintenance manager and
background tasks in general, so I tried to document everything at once.

Change-Id: Iaf0911efbb9a5114bb4ae03c1da50335d918eacc
Reviewed-on: http://gerrit.cloudera.org:8080/6501
Reviewed-by: Jean-Daniel Cryans <jd...@apache.org>
Tested-by: Jean-Daniel Cryans <jd...@apache.org>
(cherry picked from commit fcb150c1adcdd4595261433ad81358be7c0cc3a0)
Reviewed-on: http://gerrit.cloudera.org:8080/6548
Tested-by: Kudu Jenkins


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/0fada51c
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/0fada51c
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/0fada51c

Branch: refs/heads/branch-1.3.x
Commit: 0fada51ca26e438544f95b57eb3b7a5ffad51ea1
Parents: e237fc3
Author: Mike Percy <mp...@apache.org>
Authored: Tue Mar 28 12:43:21 2017 -0700
Committer: Jean-Daniel Cryans <jd...@apache.org>
Committed: Tue Apr 4 21:51:35 2017 +0000

----------------------------------------------------------------------
 docs/background_tasks.adoc                      | 103 +++++++++++++++++++
 docs/support/jekyll-templates/document.html.erb |   3 +-
 2 files changed, 105 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/0fada51c/docs/background_tasks.adoc
----------------------------------------------------------------------
diff --git a/docs/background_tasks.adoc b/docs/background_tasks.adoc
new file mode 100644
index 0000000..48faae1
--- /dev/null
+++ b/docs/background_tasks.adoc
@@ -0,0 +1,103 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+[[background_tasks]]
+= Apache Kudu Background Maintenance Tasks
+:author: Kudu Team
+:imagesdir: ./images
+:icons: font
+:toc: left
+:toclevels: 3
+:doctype: book
+:backend: html5
+:sectlinks:
+:experimental:
+
+Kudu relies on running background tasks for many important automatic
+maintenance activities. These tasks include flushing data from memory to disk,
+compacting data to improve performance, freeing up disk space, and more.
+
+== Maintenance manager
+
+The maintenance manager schedules and runs background tasks. At any given point
+in time, the maintenance manager is prioritizing the next task based on the
+improvement needed at that moment, such as relieving memory pressure, improving
+read performance, or freeing up disk space. The number of worker threads
+dedicated to running background tasks can be controlled by setting
+`--maintenance_manager_num_threads`.
+
+== Flushing data to disk
+
+Flushing data from memory to disk relieves memory pressure and can improve read
+performance by switching from a write-optimized, row-oriented in-memory format
+in the `MemRowSet` to a read-optimized, column-oriented format on disk.
+Background tasks that flush data include `FlushMRSOp` and
+`FlushDeltaMemStoresOp`.
+
+The metrics associated with these ops have the prefix `flush_mrs` and
+`flush_dms`, respectively.
+
+== Compacting on-disk data
+
+Kudu constantly performs several types of compaction tasks in order to maintain
+consistent read and write performance over time. A merging compaction, which combines
+multiple `DiskRowSets` together into a single `DiskRowSet`, is run by
+`CompactRowSetsOp`. There are two types of delta store compaction operations
+that may be run as well: `MinorDeltaCompactionOp` and `MajorDeltaCompactionOp`.
+
+For more information on what these different types of compaction operations do,
+please see the
+link:https://github.com/apache/kudu/blob/master/docs/design-docs/tablet.md[Kudu Tablet
+design document].
+
+The metrics associated with these tasks have the prefix `compact_rs`,
+`delta_minor_compact_rs`, and `delta_major_compact_rs`, respectively.
+
+== Write-ahead log GC
+
+Kudu maintains a write-ahead log (WAL) per tablet that is split into discrete
+fixed-size segments. A tablet periodically rolls the WAL to a new log segment
+when the active segment reaches a configured size (controlled by
+`--log_segment_size_mb`). In order to save disk space and decrease startup
+time, a background task called `LogGCOp` attempts to garbage-collect (GC) old
+WAL segments by deleting them from disk once it is determined that they are no
+longer needed by the local node for durability.
+
+The metrics associated with this background task have the prefix `log_gc`.
+
+== Tablet history GC and the ancient history mark
+
+Because Kudu uses a multiversion concurrency control (MVCC) mechanism to
+ensure that snapshot scans can proceeed isolated from new changes to a table,
+periodically old historical data should be garbage-collected (removed) to free
+up disk space. While Kudu never removes rows or data that are visible in the
+latest version of the data, Kudu does remove records of old changes that are no
+longer visible.
+
+The point in time in the past beyond which historical MVCC data becomes
+inaccessible and is free to be deleted is called the _ancient history mark_
+(AHM). The AHM can be configured by setting `--tablet_history_max_age_sec`.
+
+There are two background tasks that GC historical MVCC data older than the AHM:
+the one that runs the merging compaction, called `CompactRowSetsOp` (see
+above), and a separate background task that deletes old undo delta blocks,
+called `UndoDeltaBlockGCOp`. Running `UndoDeltaBlockGCOp` reduces disk space
+usage in all workloads, but particularly in those with a higher volume of
+updates or upserts.
+
+The metrics associated with this background task have the prefix
+`undo_delta_block`.

http://git-wip-us.apache.org/repos/asf/kudu/blob/0fada51c/docs/support/jekyll-templates/document.html.erb
----------------------------------------------------------------------
diff --git a/docs/support/jekyll-templates/document.html.erb b/docs/support/jekyll-templates/document.html.erb
index 7d18c04..ea3aed0 100644
--- a/docs/support/jekyll-templates/document.html.erb
+++ b/docs/support/jekyll-templates/document.html.erb
@@ -95,9 +95,10 @@ end %>
         :developing, "Developing Applications with Kudu",
         :schema_design, "Kudu Schema Design",
         :transaction_semantics, "Kudu Transaction Semantics",
-        :contributing, "Contributing to Kudu",
+        :background_tasks, "Background Maintenance Tasks",
         :configuration_reference, "Kudu Configuration Reference",
         :known_issues, "Known Issues and Limitations",
+        :contributing, "Contributing to Kudu",
         :export_control, "Export Control Notice"
       ]
       toplevels.each_slice(2) do |page, pagename|