You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by to...@apache.org on 2017/04/19 05:32:00 UTC

[46/51] [partial] kudu git commit: Add 1.3.1 release and docs

http://git-wip-us.apache.org/repos/asf/kudu/blob/2826dff0/docs/administration.html
----------------------------------------------------------------------
diff --git a/docs/administration.html b/docs/administration.html
index 16bda0a..05c3115 100644
--- a/docs/administration.html
+++ b/docs/administration.html
@@ -2,7 +2,7 @@
 title: Apache Kudu Administration
 layout: default
 active_nav: docs
-last_updated: 'Last updated 2017-03-01 12:43:33 PST'
+last_updated: 'Last updated 2017-04-18 21:59:26 PDT'
 ---
 <!--
 
@@ -686,8 +686,8 @@ be listed there with one master in the LEADER role and the others in the FOLLOWE
 contents of /masters on each master should be the same.</p>
 </li>
 <li>
-<p>Run a Kudu system check (ksck) on the cluster using the <code>kudu</code> command line tool. Help for ksck
-can be viewed via <code>kudu cluster ksck --help</code>.</p>
+<p>Run a Kudu system check (ksck) on the cluster using the <code>kudu</code> command line
+tool. See <a href="#ksck">Checking Cluster Health with <code>ksck</code></a> for more details.</p>
 </li>
 </ul>
 </div>
@@ -983,13 +983,86 @@ be listed there with one master in the LEADER role and the others in the FOLLOWE
 contents of /masters on each master should be the same.</p>
 </li>
 <li>
-<p>Run a Kudu system check (ksck) on the cluster using the <code>kudu</code> command line tool. Help for ksck
-can be viewed via <code>kudu cluster ksck --help</code>.</p>
+<p>Run a Kudu system check (ksck) on the cluster using the <code>kudu</code> command line
+tool. See <a href="#ksck">Checking Cluster Health with <code>ksck</code></a> for more details.</p>
 </li>
 </ul>
 </div>
 </div>
 </div>
+<div class="sect2">
+<h3 id="ksck"><a class="link" href="#ksck">Checking Cluster Health with <code>ksck</code></a></h3>
+<div class="paragraph">
+<p>The <code>kudu</code> CLI includes a tool named <code>ksck</code> which can be used for checking
+cluster health and data integrity. <code>ksck</code> will identify issues such as
+under-replicated tablets, unreachable tablet servers, or tablets without a
+leader.</p>
+</div>
+<div class="paragraph">
+<p><code>ksck</code> should be run from the command line, and requires the full list of master
+addresses to be specified:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-bash" data-lang="bash">$ kudu cluster ksck master-01.example.com,master-02.example.com,master-03.example.com</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To see a full list of the options available with <code>ksck</code>, use the <code>--help</code> flag.
+If the cluster is healthy, <code>ksck</code> will print a success message, and return a
+zero (success) exit status.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>Connected to the Master
+Fetched info from all 1 Tablet Servers
+Table IntegrationTestBigLinkedList is HEALTHY (1 tablet(s) checked)
+
+The metadata for 1 table(s) is HEALTHY
+OK</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>If the cluster is unhealthy, for instance if a tablet server process has
+stopped, <code>ksck</code> will report the issue(s) and return a non-zero exit status:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>Connected to the Master
+WARNING: Unable to connect to Tablet Server 8a0b66a756014def82760a09946d1fce
+(tserver-01.example.com:7050): Network error: could not send Ping RPC to server: Client connection negotiation failed: client connection to 192.168.0.2:7050: connect: Connection refused (error 61)
+WARNING: Fetched info from 0 Tablet Servers, 1 weren't reachable
+Tablet ce3c2d27010d4253949a989b9d9bf43c of table 'IntegrationTestBigLinkedList'
+is unavailable: 1 replica(s) not RUNNING
+  8a0b66a756014def82760a09946d1fce (tserver-01.example.com:7050): TS unavailable [LEADER]
+
+  Table IntegrationTestBigLinkedList has 1 unavailable tablet(s)
+
+  WARNING: 1 out of 1 table(s) are not in a healthy state
+  ==================
+  Errors:
+  ==================
+  error fetching info from tablet servers: Network error: Not all Tablet Servers are reachable
+  table consistency check error: Corruption: 1 table(s) are bad
+
+  FAILED
+  Runtime error: ksck discovered errors</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>To verify data integrity, the optional <code>--checksum_scan</code> flag can be set, which
+will ensure the cluster has consistent data by scanning each tablet replica and
+comparing results. The <code>--tables</code> or <code>--tablets</code> flags can be used to limit the
+scope of the checksum scan to specific tables or tablets, respectively. For
+example, checking data integrity on the <code>IntegrationTestBigLinkedList</code> table can
+be done with the following command:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-bash" data-lang="bash">$ kudu cluster ksck --checksum_scan --tables IntegrationTestBigLinkedList master-01.example.com,master-02.example.com,master-03.example.com</code></pre>
+</div>
+</div>
+</div>
 </div>
 </div>
     </div>
@@ -1054,6 +1127,7 @@ can be viewed via <code>kudu cluster ksck --help</code>.</p>
 <li><a href="#_perform_the_recovery">Perform the recovery</a></li>
 </ul>
 </li>
+<li><a href="#ksck">Checking Cluster Health with <code>ksck</code></a></li>
 </ul>
 </li>
 </ul> 
@@ -1076,19 +1150,19 @@ can be viewed via <code>kudu cluster ksck --help</code>.</p>
       </li> 
       <li>
 
-          <a href="contributing.html">Contributing to Kudu</a> 
+          <a href="background_tasks.html">Background Maintenance Tasks</a> 
       </li> 
       <li>
 
-          <a href="style_guide.html">Kudu Documentation Style Guide</a> 
+          <a href="configuration_reference.html">Kudu Configuration Reference</a> 
       </li> 
       <li>
 
-          <a href="configuration_reference.html">Kudu Configuration Reference</a> 
+          <a href="known_issues.html">Known Issues and Limitations</a> 
       </li> 
       <li>
 
-          <a href="known_issues.html">Known Issues and Limitations</a> 
+          <a href="contributing.html">Contributing to Kudu</a> 
       </li> 
       <li>
 

http://git-wip-us.apache.org/repos/asf/kudu/blob/2826dff0/docs/background_tasks.html
----------------------------------------------------------------------
diff --git a/docs/background_tasks.html b/docs/background_tasks.html
new file mode 100644
index 0000000..59cb5c1
--- /dev/null
+++ b/docs/background_tasks.html
@@ -0,0 +1,215 @@
+---
+title: Apache Kudu Background Maintenance Tasks
+layout: default
+active_nav: docs
+last_updated: 'Last updated 2017-04-12 15:28:02 PDT'
+---
+<!--
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+
+<div class="container">
+  <div class="row">
+    <div class="col-md-9">
+
+<h1>Apache Kudu Background Maintenance Tasks</h1>
+      <div id="preamble">
+<div class="sectionbody">
+<div class="paragraph">
+<p>Kudu relies on running background tasks for many important automatic
+maintenance activities. These tasks include flushing data from memory to disk,
+compacting data to improve performance, freeing up disk space, and more.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_maintenance_manager"><a class="link" href="#_maintenance_manager">Maintenance manager</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>The maintenance manager schedules and runs background tasks. At any given point
+in time, the maintenance manager is prioritizing the next task based on the
+improvement needed at that moment, such as relieving memory pressure, improving
+read performance, or freeing up disk space. The number of worker threads
+dedicated to running background tasks can be controlled by setting
+<code>--maintenance_manager_num_threads</code>.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_flushing_data_to_disk"><a class="link" href="#_flushing_data_to_disk">Flushing data to disk</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Flushing data from memory to disk relieves memory pressure and can improve read
+performance by switching from a write-optimized, row-oriented in-memory format
+in the <code>MemRowSet</code> to a read-optimized, column-oriented format on disk.
+Background tasks that flush data include <code>FlushMRSOp</code> and
+<code>FlushDeltaMemStoresOp</code>.</p>
+</div>
+<div class="paragraph">
+<p>The metrics associated with these ops have the prefix <code>flush_mrs</code> and
+<code>flush_dms</code>, respectively.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_compacting_on_disk_data"><a class="link" href="#_compacting_on_disk_data">Compacting on-disk data</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Kudu constantly performs several types of compaction tasks in order to maintain
+consistent read and write performance over time. A merging compaction, which combines
+multiple <code>DiskRowSets</code> together into a single <code>DiskRowSet</code>, is run by
+<code>CompactRowSetsOp</code>. There are two types of delta store compaction operations
+that may be run as well: <code>MinorDeltaCompactionOp</code> and <code>MajorDeltaCompactionOp</code>.</p>
+</div>
+<div class="paragraph">
+<p>For more information on what these different types of compaction operations do,
+please see the
+<a href="https://github.com/apache/kudu/blob/master/docs/design-docs/tablet.md">Kudu Tablet
+design document</a>.</p>
+</div>
+<div class="paragraph">
+<p>The metrics associated with these tasks have the prefix <code>compact_rs</code>,
+<code>delta_minor_compact_rs</code>, and <code>delta_major_compact_rs</code>, respectively.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_write_ahead_log_gc"><a class="link" href="#_write_ahead_log_gc">Write-ahead log GC</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Kudu maintains a write-ahead log (WAL) per tablet that is split into discrete
+fixed-size segments. A tablet periodically rolls the WAL to a new log segment
+when the active segment reaches a configured size (controlled by
+<code>--log_segment_size_mb</code>). In order to save disk space and decrease startup
+time, a background task called <code>LogGCOp</code> attempts to garbage-collect (GC) old
+WAL segments by deleting them from disk once it is determined that they are no
+longer needed by the local node for durability.</p>
+</div>
+<div class="paragraph">
+<p>The metrics associated with this background task have the prefix <code>log_gc</code>.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_tablet_history_gc_and_the_ancient_history_mark"><a class="link" href="#_tablet_history_gc_and_the_ancient_history_mark">Tablet history GC and the ancient history mark</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Because Kudu uses a multiversion concurrency control (MVCC) mechanism to
+ensure that snapshot scans can proceeed isolated from new changes to a table,
+periodically old historical data should be garbage-collected (removed) to free
+up disk space. While Kudu never removes rows or data that are visible in the
+latest version of the data, Kudu does remove records of old changes that are no
+longer visible.</p>
+</div>
+<div class="paragraph">
+<p>The point in time in the past beyond which historical MVCC data becomes
+inaccessible and is free to be deleted is called the <em>ancient history mark</em>
+(AHM). The AHM can be configured by setting <code>--tablet_history_max_age_sec</code>.</p>
+</div>
+<div class="paragraph">
+<p>There are two background tasks that GC historical MVCC data older than the AHM:
+the one that runs the merging compaction, called <code>CompactRowSetsOp</code> (see
+above), and a separate background task that deletes old undo delta blocks,
+called <code>UndoDeltaBlockGCOp</code>. Running <code>UndoDeltaBlockGCOp</code> reduces disk space
+usage in all workloads, but particularly in those with a higher volume of
+updates or upserts.</p>
+</div>
+<div class="paragraph">
+<p>The metrics associated with this background task have the prefix
+<code>undo_delta_block</code>.</p>
+</div>
+</div>
+</div>
+    </div>
+    <div class="col-md-3">
+
+  <div id="toc" data-spy="affix" data-offset-top="70">
+  <ul>
+
+      <li>
+
+          <a href="index.html">Introducing Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="release_notes.html">Kudu Release Notes</a> 
+      </li> 
+      <li>
+
+          <a href="quickstart.html">Getting Started with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="installation.html">Installation Guide</a> 
+      </li> 
+      <li>
+
+          <a href="configuration.html">Configuring Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="kudu_impala_integration.html">Using Impala with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="administration.html">Administering Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="troubleshooting.html">Troubleshooting Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="developing.html">Developing Applications with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="schema_design.html">Kudu Schema Design</a> 
+      </li> 
+      <li>
+
+          <a href="transaction_semantics.html">Kudu Transaction Semantics</a> 
+      </li> 
+      <li>
+<span class="active-toc">Background Maintenance Tasks</span>
+            <ul class="sectlevel1">
+<li><a href="#_maintenance_manager">Maintenance manager</a></li>
+<li><a href="#_flushing_data_to_disk">Flushing data to disk</a></li>
+<li><a href="#_compacting_on_disk_data">Compacting on-disk data</a></li>
+<li><a href="#_write_ahead_log_gc">Write-ahead log GC</a></li>
+<li><a href="#_tablet_history_gc_and_the_ancient_history_mark">Tablet history GC and the ancient history mark</a></li>
+</ul> 
+      </li> 
+      <li>
+
+          <a href="configuration_reference.html">Kudu Configuration Reference</a> 
+      </li> 
+      <li>
+
+          <a href="known_issues.html">Known Issues and Limitations</a> 
+      </li> 
+      <li>
+
+          <a href="contributing.html">Contributing to Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="export_control.html">Export Control Notice</a> 
+      </li> 
+  </ul>
+  </div>
+    </div>
+  </div>
+</div>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/kudu/blob/2826dff0/docs/configuration.html
----------------------------------------------------------------------
diff --git a/docs/configuration.html b/docs/configuration.html
index 2d6f9bf..a23182f 100644
--- a/docs/configuration.html
+++ b/docs/configuration.html
@@ -2,7 +2,7 @@
 title: Configuring Apache Kudu
 layout: default
 active_nav: docs
-last_updated: 'Last updated 2017-02-03 13:18:53 PST'
+last_updated: 'Last updated 2017-04-10 16:00:03 PDT'
 ---
 <!--
 
@@ -296,19 +296,19 @@ do not read this flag.</p></td>
       </li> 
       <li>
 
-          <a href="contributing.html">Contributing to Kudu</a> 
+          <a href="background_tasks.html">Background Maintenance Tasks</a> 
       </li> 
       <li>
 
-          <a href="style_guide.html">Kudu Documentation Style Guide</a> 
+          <a href="configuration_reference.html">Kudu Configuration Reference</a> 
       </li> 
       <li>
 
-          <a href="configuration_reference.html">Kudu Configuration Reference</a> 
+          <a href="known_issues.html">Known Issues and Limitations</a> 
       </li> 
       <li>
 
-          <a href="known_issues.html">Known Issues and Limitations</a> 
+          <a href="contributing.html">Contributing to Kudu</a> 
       </li> 
       <li>