You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by ab...@apache.org on 2018/12/11 21:11:26 UTC

[03/21] kudu git commit: [docs] Update docs with contributing to blog

http://git-wip-us.apache.org/repos/asf/kudu/blob/87b27857/docs/quickstart.html
----------------------------------------------------------------------
diff --git a/docs/quickstart.html b/docs/quickstart.html
new file mode 100644
index 0000000..0c93bc2
--- /dev/null
+++ b/docs/quickstart.html
@@ -0,0 +1,435 @@
+---
+title: Apache Kudu Quickstart
+layout: default
+active_nav: docs
+last_updated: 'Last updated 2018-10-24 23:33:04 CEST'
+---
+<!--
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+
+<div class="container">
+  <div class="row">
+    <div class="col-md-9">
+
+<h1>Apache Kudu Quickstart</h1>
+      <div id="preamble">
+<div class="sectionbody">
+<div class="paragraph">
+<p>Follow these instructions to set up and run the Kudu VM, and start with Kudu, Kudu_Impala,
+and CDH in minutes.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="quickstart_vm"><a class="link" href="#quickstart_vm">Get The Kudu Quickstart VM</a></h2>
+<div class="sectionbody">
+<div class="sect2">
+<h3 id="_prerequisites"><a class="link" href="#_prerequisites">Prerequisites</a></h3>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Install <a href="https://www.virtualbox.org/">Oracle Virtualbox</a>. The VM has been tested to work
+with VirtualBox version 4.3 on Ubuntu 14.04 and VirtualBox version 5 on OSX
+10.9. VirtualBox is also included in most package managers: apt-get, brew, etc.</p>
+</li>
+<li>
+<p>After the installation, make sure that <code>VBoxManage</code> is in your <code>PATH</code> by using the
+<code>which VBoxManage</code> command.</p>
+</li>
+</ol>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_installation"><a class="link" href="#_installation">Installation</a></h3>
+<div class="paragraph">
+<p>To download and start the VM, execute the following command in a terminal window.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-bash" data-lang="bash">$ curl -s https://raw.githubusercontent.com/cloudera/kudu-examples/master/demo-vm-setup/bootstrap.sh | bash</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This command downloads a shell script which clones the <code>kudu-examples</code> Git repository and
+then downloads a VM image of about 1.2GB size into the current working
+directory.<sup class="footnote">[<a id="_footnoteref_1" class="footnote" href="#_footnote_1" title="View footnote.">1</a>]</sup> You can examine the script after downloading it by removing
+the <code>| bash</code> component of the command above. Once the setup is complete, you can verify
+that everything works by connecting to the guest via SSH:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-bash" data-lang="bash">$ ssh demo@quickstart.cloudera</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The username and password for the demo account are both <code>demo</code>. In addition, the <code>demo</code>
+user has password-less <code>sudo</code> privileges so that you can install additional software or
+manage the guest OS. You can also access the <code>kudu-examples</code> as a shared folder in
+<code>/home/demo/kudu-examples/</code> on the guest or from your VirtualBox shared folder location on
+the host. This is a quick way to make scripts or data visible to the guest.</p>
+</div>
+<div class="paragraph">
+<p>You can quickly verify if Kudu and Impala are running by executing the following commands:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-bash" data-lang="bash">$ ps aux | grep kudu
+$ ps aux | grep impalad</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>If you have issues connecting to the VM or one of the processes is not running, make sure
+to consult the <a href="#trouble">Troubleshooting</a> section.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_load_data"><a class="link" href="#_load_data">Load Data</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>To practice some typical operations with Kudu and Impala, we&#8217;ll use the
+<a href="https://data.sfgov.org/Transportation/Raw-AVL-GPS-data/5fk7-ivit/data">San Francisco MTA
+GPS dataset</a>. This dataset contains raw location data transmitted periodically from
+sensors installed on the buses in the SF MTA&#8217;s fleet.</p>
+</div>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Download the sample data and load it into HDFS</p>
+<div class="paragraph">
+<p>First we&#8217;ll download the sample dataset, prepare it, and upload it into the HDFS
+cluster.</p>
+</div>
+<div class="paragraph">
+<p>The SF MTA&#8217;s site is often a bit slow, so we&#8217;ve mirrored a sample CSV file from the
+dataset at <a href="http://kudu-sample-data.s3.amazonaws.com/sfmtaAVLRawData01012013.csv.gz" class="bare">http://kudu-sample-data.s3.amazonaws.com/sfmtaAVLRawData01012013.csv.gz</a></p>
+</div>
+<div class="paragraph">
+<p>The original dataset uses DOS-style line endings, so we&#8217;ll convert it to
+UNIX-style during the upload process using <code>tr</code>.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-bash" data-lang="bash">$ wget http://kudu-sample-data.s3.amazonaws.com/sfmtaAVLRawData01012013.csv.gz
+$ hdfs dfs -mkdir /sfmta
+$ zcat sfmtaAVLRawData01012013.csv.gz | tr -d '\r' | hadoop fs -put - /sfmta/data.csv</code></pre>
+</div>
+</div>
+</li>
+<li>
+<p>Create a new external Impala table to access the plain text data. To connect to Impala
+in the virtual machine issue the following command:</p>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-bash" data-lang="bash">ssh demo@quickstart.cloudera -t impala-shell</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Now, you can execute the following commands:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE EXTERNAL TABLE sfmta_raw (
+  revision int,
+  report_time string,
+  vehicle_tag int,
+  longitude float,
+  latitude float,
+  speed float,
+  heading float
+)
+ROW FORMAT DELIMITED
+FIELDS TERMINATED BY ','
+LOCATION '/sfmta/'
+TBLPROPERTIES ('skip.header.line.count'='1');</code></pre>
+</div>
+</div>
+</li>
+<li>
+<p>Validate if the data was actually loaded run the following command:</p>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-sql" data-lang="sql">SELECT count(*) FROM sfmta_raw;
+
++----------+
+| count(*) |
++----------+
+| 859086   |
++----------+</code></pre>
+</div>
+</div>
+</li>
+<li>
+<p>Next we&#8217;ll create a Kudu table and load the data. Note that we convert
+the string <code>report_time</code> field into a unix-style timestamp for more efficient
+storage.</p>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-sql" data-lang="sql">CREATE TABLE sfmta
+PRIMARY KEY (report_time, vehicle_tag)
+PARTITION BY HASH(report_time) PARTITIONS 8
+STORED AS KUDU
+AS SELECT
+  UNIX_TIMESTAMP(report_time,  'MM/dd/yyyy HH:mm:ss') AS report_time,
+  vehicle_tag,
+  longitude,
+  latitude,
+  speed,
+  heading
+FROM sfmta_raw;
+
++------------------------+
+| summary                |
++------------------------+
+| Inserted 859086 row(s) |
++------------------------+
+Fetched 1 row(s) in 5.75s</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The created table uses a composite primary key. See
+<a href="kudu_impala_integration.html">Kudu Impala Integration</a> for a more detailed
+introduction to the extended SQL syntax for Impala.</p>
+</div>
+</li>
+</ol>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_read_and_modify_data"><a class="link" href="#_read_and_modify_data">Read and Modify Data</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Now that the data is stored in Kudu, you can run queries against it. The following query
+finds the data point containing the highest recorded vehicle speed.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-sql" data-lang="sql">SELECT * FROM sfmta ORDER BY speed DESC LIMIT 1;
+
++-------------+-------------+--------------------+-------------------+-------------------+---------+
+| report_time | vehicle_tag | longitude          | latitude          | speed             | heading |
++-------------+-------------+--------------------+-------------------+-------------------+---------+
+| 1357022342  | 5411        | -122.3968811035156 | 37.76665878295898 | 68.33300018310547 | 82      |
++-------------+-------------+--------------------+-------------------+-------------------+---------+</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>With a quick <a href="https://www.google.com/search?q=122.3968811035156W+37.76665878295898N">Google search</a>
+we can see that this bus was traveling east on 16th street at 68MPH.
+At first glance, this seems unlikely to be true. Perhaps we do some research
+and find that this bus&#8217;s sensor equipment was broken and we decide to
+remove the data. With Kudu this is very easy to correct using standard
+SQL:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-sql" data-lang="sql">DELETE FROM sfmta WHERE vehicle_tag = '5411';
+
+-- Modified 1169 row(s), 0 row error(s) in 0.25s</code></pre>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_next_steps"><a class="link" href="#_next_steps">Next steps</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>The above example showed how to load, query, and mutate a static dataset with Impala
+and Kudu. The real power of Kudu, however, is the ability to ingest and mutate data
+in a streaming fashion.</p>
+</div>
+<div class="paragraph">
+<p>As an exercise to learn the Kudu programmatic APIs, try implementing a program
+that uses the <a href="http://www.nextbus.com/xmlFeedDocs/NextBusXMLFeed.pdf">SFMTA
+XML data feed</a> to ingest this same dataset in real time into the Kudu table.</p>
+</div>
+<div class="sect2">
+<h3 id="trouble"><a class="link" href="#trouble">Troubleshooting</a></h3>
+<div class="sect3">
+<h4 id="_problems_accessing_the_vm_via_ssh"><a class="link" href="#_problems_accessing_the_vm_via_ssh">Problems accessing the VM via SSH</a></h4>
+<div class="ulist">
+<ul>
+<li>
+<p>Make sure the host has a SSH client installed.</p>
+</li>
+<li>
+<p>Make sure the VM is running, by running the following command and checking for a VM called <code>kudu-demo</code>:</p>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-bash" data-lang="bash">$ VBoxManage list runningvms</code></pre>
+</div>
+</div>
+</li>
+<li>
+<p>Verify that the VM&#8217;s IP address is included in the host&#8217;s <code>/etc/hosts</code> file. You should
+see a line that includes an IP address followed by the hostname
+<code>quickstart.cloudera</code>. To check the running VM&#8217;s IP address, use the <code>VBoxManage</code>
+command below.</p>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-bash" data-lang="bash">$ VBoxManage guestproperty get kudu-demo /VirtualBox/GuestInfo/Net/0/V4/IP
+Value: 192.168.56.100</code></pre>
+</div>
+</div>
+</li>
+<li>
+<p>If you&#8217;ve used a Cloudera Quickstart VM before, your <code>.ssh/known_hosts</code> file may
+contain references to the previous VM&#8217;s SSH credentials. Remove any references to
+<code>quickstart.cloudera</code> from this file.</p>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_failing_with_lack_of_sse4_2_support_when_running_inside_virtualbox"><a class="link" href="#_failing_with_lack_of_sse4_2_support_when_running_inside_virtualbox">Failing with lack of SSE4.2 support when running inside VirtualBox</a></h4>
+<div class="ulist">
+<ul>
+<li>
+<p>Running Kudu currently requires a CPU that supports SSE4.2 (Nehalem or later for Intel). To pass through SSE4.2 support into the guest VM, refer to the <a href="https://www.virtualbox.org/manual/ch09.html#sse412passthrough">VirtualBox documentation</a></p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_next_steps_2"><a class="link" href="#_next_steps_2">Next Steps</a></h2>
+<div class="sectionbody">
+<div class="ulist">
+<ul>
+<li>
+<p><a href="installation.html">Installing Kudu</a></p>
+</li>
+<li>
+<p><a href="configuration.html">Configuring Kudu</a></p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+    </div>
+    <div class="col-md-3">
+
+  <div id="toc" data-spy="affix" data-offset-top="70">
+  <ul>
+
+      <li>
+
+          <a href="index.html">Introducing Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="release_notes.html">Kudu Release Notes</a> 
+      </li> 
+      <li>
+<span class="active-toc">Getting Started with Kudu</span>
+            <ul class="sectlevel1">
+<li><a href="#quickstart_vm">Get The Kudu Quickstart VM</a>
+<ul class="sectlevel2">
+<li><a href="#_prerequisites">Prerequisites</a></li>
+<li><a href="#_installation">Installation</a></li>
+</ul>
+</li>
+<li><a href="#_load_data">Load Data</a></li>
+<li><a href="#_read_and_modify_data">Read and Modify Data</a></li>
+<li><a href="#_next_steps">Next steps</a>
+<ul class="sectlevel2">
+<li><a href="#trouble">Troubleshooting</a></li>
+</ul>
+</li>
+<li><a href="#_next_steps_2">Next Steps</a></li>
+</ul> 
+      </li> 
+      <li>
+
+          <a href="installation.html">Installation Guide</a> 
+      </li> 
+      <li>
+
+          <a href="configuration.html">Configuring Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="kudu_impala_integration.html">Using Impala with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="administration.html">Administering Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="troubleshooting.html">Troubleshooting Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="developing.html">Developing Applications with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="schema_design.html">Kudu Schema Design</a> 
+      </li> 
+      <li>
+
+          <a href="scaling_guide.html">Kudu Scaling Guide</a> 
+      </li> 
+      <li>
+
+          <a href="security.html">Kudu Security</a> 
+      </li> 
+      <li>
+
+          <a href="transaction_semantics.html">Kudu Transaction Semantics</a> 
+      </li> 
+      <li>
+
+          <a href="background_tasks.html">Background Maintenance Tasks</a> 
+      </li> 
+      <li>
+
+          <a href="configuration_reference.html">Kudu Configuration Reference</a> 
+      </li> 
+      <li>
+
+          <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> 
+      </li> 
+      <li>
+
+          <a href="known_issues.html">Known Issues and Limitations</a> 
+      </li> 
+      <li>
+
+          <a href="contributing.html">Contributing to Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="export_control.html">Export Control Notice</a> 
+      </li> 
+  </ul>
+  </div>
+    </div>
+  </div>
+</div>
+
+
+  <div id="footnotes">
+  <hr>
+      <div class="footnote" id="_footnote_1">
+      <a href="#_footnoteref_1">1</a>. In addition, the script will create a host-only network between host and guest and setup an entry in the <code>/etc/hosts</code> file with the name <code>quickstart.cloudera</code> and the guest&#8217;s IP address.
+      </div>
+  </div>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/kudu/blob/87b27857/docs/release_notes.html
----------------------------------------------------------------------
diff --git a/docs/release_notes.html b/docs/release_notes.html
new file mode 100644
index 0000000..0651214
--- /dev/null
+++ b/docs/release_notes.html
@@ -0,0 +1,641 @@
+---
+title: Apache Kudu 1.8.0 Release Notes
+layout: default
+active_nav: docs
+last_updated: 'Last updated 2018-12-07 15:50:19 CET'
+---
+<!--
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+
+<div class="container">
+  <div class="row">
+    <div class="col-md-9">
+
+<h1>Apache Kudu 1.8.0 Release Notes</h1>
+      <div class="sect1">
+<h2 id="rn_1.8.0_upgrade_notes"><a class="link" href="#rn_1.8.0_upgrade_notes">Upgrade Notes</a></h2>
+<div class="sectionbody">
+<div class="ulist">
+<ul>
+<li>
+<p>Upgrading directly from Kudu 1.7.0 is supported and no special upgrade steps are
+required. A rolling upgrade may work, however it has not been tested. When upgrading
+Kudu, it is recommended to first shut down all Kudu processes across the cluster, then
+upgrade the software on all servers, then restart the Kudu processes on all servers in
+the cluster.</p>
+</li>
+<li>
+<p>Kudu Flume Sink released with Kudu 1.8.0 is compiled against Apache Flume 1.8 and might
+not function with earlier versions of Flume. Note that Flume 1.8 requires Java 1.8 or
+higher.</p>
+</li>
+<li>
+<p>Hadoop 3.0+ requires Java 8 at runtime even though the Kudu Hadoop integration is Java 7
+compatible. Hadoop 3.1 is the default dependency version as of Kudu 1.8.0, used by
+certain features in the Java client.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="rn_1.8.0_obsoletions"><a class="link" href="#rn_1.8.0_obsoletions">Obsoletions</a></h2>
+<div class="sectionbody">
+<div class="ulist">
+<ul>
+<li>
+<p>The <code>-table_num_buckets</code> configuration option of the <code>kudu perf loadgen</code> tool is now
+removed in favor of <code>-table_num_hash_partitions</code> and <code>-table_num_range_partitions</code>
+(see <a href="https://issues.apache.org/jira/browse/KUDU-1861">KUDU-1861</a>).</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="rn_1.8.0_deprecations"><a class="link" href="#rn_1.8.0_deprecations">Deprecations</a></h2>
+<div class="sectionbody">
+<div class="ulist">
+<ul>
+<li>
+<p>Support for Java 7 has been deprecated since Kudu 1.5.0 and may be removed in the next
+major release.</p>
+</li>
+<li>
+<p>The <code>producer.skipMissingColumn</code>, <code>producer.skipBadColumnValue</code>, and
+<code>producer.warnUnmatchedRows</code> Kudu Flume sink configuration parameters have been
+deprecated in favor of <code>producer.missingColumnPolicy</code>, <code>producer.badColumnValuePolicy</code>,
+and <code>producer.unmatchedRowPolicy</code> respectively (see
+<a href="https://issues.apache.org/jira/browse/KUDU-1882">KUDU-1882</a>).</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="rn_1.8.0_new_features"><a class="link" href="#rn_1.8.0_new_features">New features</a></h2>
+<div class="sectionbody">
+<div class="ulist">
+<ul>
+<li>
+<p>Examples showcasing functionality in C++, Java, and Python, previously
+hosted in a separate repository have been added. They can be found in the
+<code><a href="https://github.com/apache/kudu/tree/master/examples">examples/</a></code>
+top-level subdirectory.</p>
+</li>
+<li>
+<p>Added <code>kudu diagnose parse_stacks</code>, a tool to parse sampled stack traces out of a
+diagnostics log (see <a href="https://issues.apache.org/jira/browse/KUDU-2353">KUDU-2353</a>).</p>
+</li>
+<li>
+<p>Added support for <code>IS NULL</code> and <code>IS NOT NULL</code> predicates to the Kudu Python client (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2399">KUDU-2399</a>).</p>
+</li>
+<li>
+<p>Introduced <a href="administration.html#rebalancer_tool">manual data rebalancer</a> into the kudu
+CLI tool. The rebalancer can be used to redistribute table replicas among tablet
+servers. The rebalancer can be run via <code>kudu cluster rebalance</code> sub-command. Using the
+new tool, it&#8217;s possible to rebalance Kudu clusters of version 1.4.0 and newer.</p>
+</li>
+<li>
+<p>Added <code>kudu tserver get_flags</code> and <code>kudu master get_flags</code>, two tools that allow
+superusers to retrieve all the values of command line flags from remote Kudu processes.
+The <code>get_flags</code> tools support filtering the returned flags by tag, and by default will
+return only flags that were explicitly set.</p>
+</li>
+<li>
+<p>Added <code>kudu tablet unsafe_replace_tablet</code>, a tool to replace a tablet with a new one.
+This tool is meant to be used to recover a table when one of its tablets has permanently
+lost all replicas. The data in the tablet that is replaced is lost, so this tool should
+only be used as a last resort (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2290">KUDU-2290</a>).</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="rn_1.8.0_improvements"><a class="link" href="#rn_1.8.0_improvements">Optimizations and improvements</a></h2>
+<div class="sectionbody">
+<div class="ulist">
+<ul>
+<li>
+<p>There is a new metric for each tablet replica tracking the number of election failures
+since the last successful election attempt and the time since the last heartbeat from
+the leader (see <a href="https://issues.apache.org/jira/browse/KUDU-2287">KUDU-2287</a>).</p>
+</li>
+<li>
+<p>Kudu now supports building and running on Ubuntu 18.04 (“Bionic Beaver”) (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2427">KUDU-2427</a>).</p>
+</li>
+<li>
+<p>Kudu now supports building and running against OpenSSL 1.1 (see
+<a href="https://issues.apache.org/jira/browse/KUDU-1889">KUDU-1889</a>).</p>
+</li>
+<li>
+<p>Added Kerberos support to the Kudu Flume sink (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2012">KUDU-2012</a>).</p>
+</li>
+<li>
+<p>The Kudu Spark connector now supports Spark Streaming DataFrames (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2539">KUDU-2539</a>).</p>
+</li>
+<li>
+<p>Added <code>-tables</code> filtering argument to <code>kudu table list</code> (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2529">KUDU-2529</a>).</p>
+</li>
+<li>
+<p>Clients now support setting a limit on the number of returned rows in scans (see
+<a href="https://issues.apache.org/jira/browse/KUDU-16">KUDU-16</a>).</p>
+</li>
+<li>
+<p>Added Pandas support to the Python client (see
+<a href="https://issues.apache.org/jira/browse/KUDU-1276">KUDU-1276</a>).</p>
+</li>
+<li>
+<p>Enabled configuration of mutation buffer in the Python client (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2441">KUDU-2441</a>).</p>
+</li>
+<li>
+<p>Added a <code>keepAlive</code> API call to the <code>KuduScanner</code> and <code>AsyncKuduScanner</code> in the Java
+client.  This API can be used to keep the scanners alive on the server when processing
+of messages will take longer than the scanner TTL (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2095">KUDU-2095</a>).</p>
+</li>
+<li>
+<p>The Kudu Spark integration now uses the keepAlive API when reading data. By default it
+will call keepAlive on a scanner with a period of 15 seconds. This will ensure that
+Spark jobs with large batch sizes or slow processing times do not fail with scanner not
+found errors (see <a href="https://issues.apache.org/jira/browse/KUDU-2563">KUDU-2563</a>).</p>
+</li>
+<li>
+<p>Number of reactor threads in the C++ client is now configurable (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2368">KUDU-2368</a>).</p>
+</li>
+<li>
+<p>Added an optimization to reduce CPU consumption when performing hot metadata lookups in
+the C++ client (see <a href="https://issues.apache.org/jira/browse/KUDU-1977">KUDU-1977</a>).</p>
+</li>
+<li>
+<p>Added an optimization to avoid bottlenecks on <code>getpwuid_r()</code> in libnss during a Raft
+leader election storm (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2395">KUDU-2395</a>).</p>
+</li>
+<li>
+<p>Improved rowset tree pruning making scans with open-ended intervals on primary key (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2566">KUDU-2566</a>).</p>
+</li>
+<li>
+<p>The <code>kudu perf loadgen</code> tool now supports generating range-partitioned tables. The
+<code>-table_num_buckets</code> configuration is now removed in favor of
+<code>-table_num_hash_partitions</code> and <code>-table_num_range_partitions</code> (see
+<a href="https://issues.apache.org/jira/browse/KUDU-1861">KUDU-1861</a>).</p>
+</li>
+<li>
+<p>CFile checksum failures will now cause the affected tablet replicas to be failed and
+re-replicated elsewhere (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2469">KUDU-2469</a>).</p>
+</li>
+<li>
+<p>Servers are now able to start up with data directories missing on disk (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2359">KUDU-2359</a>).</p>
+</li>
+<li>
+<p>The <code>kudu perf loadgen</code> tool now creates tables with a period-separated database name,
+for example <code>default.loadgen_auto_abc123</code>. This new behavior does not take effect if the
+<code>--table</code> flag is provided. The database of the table can be changed using a new
+<code>--auto_database</code> flag. This change is made in anticipation of an eventual Kudu/HMS
+integration (see <a href="https://jira.apache.org/jira/browse/KUDU-2191">KUDU-2191</a>).</p>
+</li>
+<li>
+<p>Introduced <code>FAILED_UNRECOVERABLE</code> replica health status. This is to mark replicas which
+are not able to catch up with the leader due to GC-collected segments of WAL and other
+unrecoverable cases like disk failure. With that, the replica management scheme becomes
+hybrid: the system evicts replicas with <code>FAILED_UNRECOVERABLE</code> health status before
+adding a replacement if it anticipates that it can commit the transaction, while in
+other cases it first adds a non-voter replica and removes the failed one only after
+promoting a newly added replica to voter role.</p>
+</li>
+<li>
+<p>Two additional configuration parameters, <code>socketReadTimeoutMs</code>  and <code>scanRequestTimeout</code>
+have been added to the Spark connector to allow better tuning to avoid scan timeouts
+under high load.</p>
+</li>
+<li>
+<p>The <code>kudu table</code> tool now supports two new options to rename tables and columns,
+<code>rename_table</code> and <code>rename_column</code> respectively.</p>
+</li>
+<li>
+<p>Kudu will now wait for the clock to become synchronized at startup, controlled by a new
+flag <code>-ntp_initial_sync_wait_secs</code> (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2242">KUDU-2242</a>).</p>
+</li>
+<li>
+<p>Tablet deletions are now throttled, which will help Kudu clusters remain stable even
+when many tablets are deleted at once. The number of tablets that a tablet server will
+delete at once is controlled by the new flag <code>-num_tablets_to_delete_simultaneously</code>
+(see <a href="https://issues.apache.org/jira/browse/KUDU-2289">KUDU-2289</a>).</p>
+</li>
+<li>
+<p>The <code>kudu cluster ksck</code> tool has been significantly enhanced. It now checks master
+health and consensus status, displays any unsafe or hidden flags set in the cluster, and
+produces a summary of the Kudu versions running on the master and tablet servers. In
+addition, it now supports JSON output, both in pretty-printed and compact form. The
+output format is controlled by the <code>-ksck_format</code> flag.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="rn_1.8.0_fixed_issues"><a class="link" href="#rn_1.8.0_fixed_issues">Fixed Issues</a></h2>
+<div class="sectionbody">
+<div class="ulist">
+<ul>
+<li>
+<p>When a tablet server was wiped and recreated with the same RPC address, <code>ksck</code> listed it
+twice, both as healthy, even though only one of them was there. This bug is now fixed by
+verifying the UUID of the server (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2364">KUDU-2364</a>).</p>
+</li>
+<li>
+<p>Fixed an issue preventing Kudu from starting when using Vormetric&#8217;s encrypted filesystem
+(secfs2) on ext4 (see <a href="https://issues.apache.org/jira/browse/KUDU-2406">KUDU-2406</a>).</p>
+</li>
+<li>
+<p>Fixed an issue where Kudu&#8217;s block cache memory tracking (as seen on the <code>/mem-trackers</code>
+web UI page) wasn’t accounting for all of the overhead of the cache itself (see
+<a href="https://issues.apache.org/jira/browse/KUDU-972">KUDU-972</a>).</p>
+</li>
+<li>
+<p>Fixed an issue where the C++ client would fail to reopen an expired scanner; instead,
+the client would retry in a tight loop and eventually timeout (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2414">KUDU-2414</a>).</p>
+</li>
+<li>
+<p>When a tablet is deleted, its write-ahead log recovery directory is also deleted, if it
+exists (see <a href="https://issues.apache.org/jira/browse/KUDU-1038">KUDU-1038</a>).</p>
+</li>
+<li>
+<p>Fixed a tablet server crash when a tablet is scanned with two predicates on its primary
+key and the predicates do not overlap (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2447">KUDU-2447</a>).</p>
+</li>
+<li>
+<p>Fixed an issue where the Kudu MapReduce connector&#8217;s <code>KuduTableInputFormat</code> may exhaust
+its scan too early (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2525">KUDU-2525</a>).</p>
+</li>
+<li>
+<p>Fixed an issue with failed tablet copies that would cause subsequent tablet copies to
+crash the tablet server (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2293">KUDU-2293</a>).</p>
+</li>
+<li>
+<p>Fixed a bug in which incorrect results would be returned in scans following a
+server restart (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2463">KUDU-2463</a>).</p>
+</li>
+<li>
+<p>Fixed a bug causing a tablet server crash when a write batch request from a client
+failed coarse-grained authorization (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2540">KUDU-2540</a>).</p>
+</li>
+<li>
+<p>Fixed use-after-free in case of WAL replay error (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2509">KUDU-2509</a>).</p>
+</li>
+<li>
+<p>Fixed authentication token reacquisition in the C++ client (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2580">KUDU-2580</a>).</p>
+</li>
+<li>
+<p>Fixed a bug where leader logged excessively when the followers fell behind (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2322">KUDU-2322</a>).</p>
+</li>
+<li>
+<p>Fixed reporting of leader health during lifecycle transitions (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2335">KUDU-2335</a>).</p>
+</li>
+<li>
+<p>Fixed moving single-replica tablets (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2443">KUDU-2443</a>).</p>
+</li>
+<li>
+<p>Fixed an error that would cause the kudu CLI tool to unexpectedly exit when the
+connection to the master or tserver was abruptly closed.</p>
+</li>
+<li>
+<p>Fixed a rare issue where system failure could leave unexpected null bytes at the end of
+metadata files, causing Kudu to be unable to restart (see
+<a href="https://issues.apache.org/jira/browse/KUDU-2260">KUDU-2260</a>).</p>
+</li>
+<li>
+<p>Fixed an issue where <code>kudu cluster ksck</code> running a snapshot checksum scan would use a
+single snapshot timestamp for all tablets. This caused the checksum process to fail if
+the checksum process took a long time and the number of tablets was sufficiently large.
+The tool should now be able to checksum tables even if the process takes many hours.
+(see <a href="https://issues.apache.org/jira/browse/KUDU-2179">KUDU-2179</a>).</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="rn_1.8.0_wire_compatibility"><a class="link" href="#rn_1.8.0_wire_compatibility">Wire Protocol compatibility</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Kudu 1.8.0 is wire-compatible with previous versions of Kudu:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Kudu 1.8 clients may connect to servers running Kudu 1.0 or later. If the client uses
+features that are not available on the target server, an error will be returned.</p>
+</li>
+<li>
+<p>Kudu 1.0 clients may connect to servers running Kudu 1.8 with the exception of the
+below-mentioned restrictions regarding secure clusters.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The authentication features introduced in Kudu 1.3 place the following limitations on wire
+compatibility between Kudu 1.8 and versions earlier than 1.3:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>If a Kudu 1.8 cluster is configured with authentication or encryption set to "required",
+clients older than Kudu 1.3 will be unable to connect.</p>
+</li>
+<li>
+<p>If a Kudu 1.8 cluster is configured with authentication and encryption set to "optional"
+or "disabled", older clients will still be able to connect.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="rn_1.8.0_incompatible_changes"><a class="link" href="#rn_1.8.0_incompatible_changes">Incompatible Changes in Kudu 1.8.0</a></h2>
+<div class="sectionbody">
+<div class="sect2">
+<h3 id="rn_1.8.0_client_compatibility"><a class="link" href="#rn_1.8.0_client_compatibility">Client Library Compatibility</a></h3>
+<div class="ulist">
+<ul>
+<li>
+<p>The Kudu 1.8 Java client library is API- and ABI-compatible with Kudu 1.7. Applications
+written against Kudu 1.7 will compile and run against the Kudu 1.8 client library and
+vice-versa.</p>
+</li>
+<li>
+<p>The Kudu 1.8 C++ client is API- and ABI-forward-compatible with Kudu 1.7.
+Applications written and compiled against the Kudu 1.7 client library will run without
+modification against the Kudu 1.8 client library. Applications written and compiled
+against the Kudu 1.8 client library will run without modification against the Kudu 1.7
+client library.</p>
+</li>
+<li>
+<p>The Kudu 1.8 Python client is API-compatible with Kudu 1.7. Applications written against
+Kudu 1.7 will continue to run against the Kudu 1.8 client and vice-versa.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="rn_1.8.0_known_issues"><a class="link" href="#rn_1.8.0_known_issues">Known Issues and Limitations</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Please refer to the <a href="known_issues.html">Known Issues and Limitations</a> section of the
+documentation.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="rn_1.8.0_contributors"><a class="link" href="#rn_1.8.0_contributors">Contributors</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Kudu 1.8 includes contributions from 40 people, including 15 first-time contributors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Anupama Gupta</p>
+</li>
+<li>
+<p>Attila Piros</p>
+</li>
+<li>
+<p>Brian McDevitt</p>
+</li>
+<li>
+<p>Fengling Wang</p>
+</li>
+<li>
+<p>Ferenc Szabó</p>
+</li>
+<li>
+<p>Greg Solovyev</p>
+</li>
+<li>
+<p>Kiyoshi Mizumaru</p>
+</li>
+<li>
+<p>Shriya Gupta</p>
+</li>
+<li>
+<p>Thomas Tauber-Marshall</p>
+</li>
+<li>
+<p>Tigerquoll</p>
+</li>
+<li>
+<p>Yao Xu</p>
+</li>
+<li>
+<p>ZhangYao</p>
+</li>
+<li>
+<p>helifu</p>
+</li>
+<li>
+<p>jinxing64</p>
+</li>
+<li>
+<p>qqchang2nd</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Thank you for helping to make Kudu even better!</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="resources_and_next_steps"><a class="link" href="#resources_and_next_steps">Resources</a></h2>
+<div class="sectionbody">
+<div class="ulist">
+<ul>
+<li>
+<p><a href="http://kudu.apache.org">Kudu Website</a></p>
+</li>
+<li>
+<p><a href="http://github.com/apache/kudu">Kudu GitHub Repository</a></p>
+</li>
+<li>
+<p><a href="index.html">Kudu Documentation</a></p>
+</li>
+<li>
+<p><a href="prior_release_notes.html">Release notes for older releases</a></p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_installation_options"><a class="link" href="#_installation_options">Installation Options</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>For full installation details, see <a href="installation.html">Kudu Installation</a>.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_next_steps"><a class="link" href="#_next_steps">Next Steps</a></h2>
+<div class="sectionbody">
+<div class="ulist">
+<ul>
+<li>
+<p><a href="quickstart.html">Kudu Quickstart</a></p>
+</li>
+<li>
+<p><a href="installation.html">Installing Kudu</a></p>
+</li>
+<li>
+<p><a href="configuration.html">Configuring Kudu</a></p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+    </div>
+    <div class="col-md-3">
+
+  <div id="toc" data-spy="affix" data-offset-top="70">
+  <ul>
+
+      <li>
+
+          <a href="index.html">Introducing Kudu</a> 
+      </li> 
+      <li>
+<span class="active-toc">Kudu Release Notes</span>
+            <ul class="sectlevel1">
+<li><a href="#rn_1.8.0_upgrade_notes">Upgrade Notes</a></li>
+<li><a href="#rn_1.8.0_obsoletions">Obsoletions</a></li>
+<li><a href="#rn_1.8.0_deprecations">Deprecations</a></li>
+<li><a href="#rn_1.8.0_new_features">New features</a></li>
+<li><a href="#rn_1.8.0_improvements">Optimizations and improvements</a></li>
+<li><a href="#rn_1.8.0_fixed_issues">Fixed Issues</a></li>
+<li><a href="#rn_1.8.0_wire_compatibility">Wire Protocol compatibility</a></li>
+<li><a href="#rn_1.8.0_incompatible_changes">Incompatible Changes in Kudu 1.8.0</a>
+<ul class="sectlevel2">
+<li><a href="#rn_1.8.0_client_compatibility">Client Library Compatibility</a></li>
+</ul>
+</li>
+<li><a href="#rn_1.8.0_known_issues">Known Issues and Limitations</a></li>
+<li><a href="#rn_1.8.0_contributors">Contributors</a></li>
+<li><a href="#resources_and_next_steps">Resources</a></li>
+<li><a href="#_installation_options">Installation Options</a></li>
+<li><a href="#_next_steps">Next Steps</a></li>
+</ul> 
+      </li> 
+      <li>
+
+          <a href="quickstart.html">Getting Started with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="installation.html">Installation Guide</a> 
+      </li> 
+      <li>
+
+          <a href="configuration.html">Configuring Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="kudu_impala_integration.html">Using Impala with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="administration.html">Administering Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="troubleshooting.html">Troubleshooting Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="developing.html">Developing Applications with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="schema_design.html">Kudu Schema Design</a> 
+      </li> 
+      <li>
+
+          <a href="scaling_guide.html">Kudu Scaling Guide</a> 
+      </li> 
+      <li>
+
+          <a href="security.html">Kudu Security</a> 
+      </li> 
+      <li>
+
+          <a href="transaction_semantics.html">Kudu Transaction Semantics</a> 
+      </li> 
+      <li>
+
+          <a href="background_tasks.html">Background Maintenance Tasks</a> 
+      </li> 
+      <li>
+
+          <a href="configuration_reference.html">Kudu Configuration Reference</a> 
+      </li> 
+      <li>
+
+          <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> 
+      </li> 
+      <li>
+
+          <a href="known_issues.html">Known Issues and Limitations</a> 
+      </li> 
+      <li>
+
+          <a href="contributing.html">Contributing to Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="export_control.html">Export Control Notice</a> 
+      </li> 
+  </ul>
+  </div>
+    </div>
+  </div>
+</div>
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/kudu/blob/87b27857/docs/scaling_guide.html
----------------------------------------------------------------------
diff --git a/docs/scaling_guide.html b/docs/scaling_guide.html
new file mode 100644
index 0000000..79ba414
--- /dev/null
+++ b/docs/scaling_guide.html
@@ -0,0 +1,469 @@
+---
+title: Apache Kudu Scaling Guide
+layout: default
+active_nav: docs
+last_updated: 'Last updated 2018-10-24 23:33:04 CEST'
+---
+<!--
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+
+<div class="container">
+  <div class="row">
+    <div class="col-md-9">
+
+<h1>Apache Kudu Scaling Guide</h1>
+      <div id="preamble">
+<div class="sectionbody">
+<div class="paragraph">
+<p>This document describes in detail how Kudu scales with respect to various system resources,
+including memory, file descriptors, and threads. See the
+<a href="known_issues.html#_scale">scaling limits</a> for the maximum recommended parameters of a Kudu
+cluster. They can be used to estimate roughly the number of servers required for a given quantity
+of data.</p>
+</div>
+<div class="admonitionblock warning">
+<table>
+<tr>
+<td class="icon">
+<i class="fa icon-warning" title="Warning"></i>
+</td>
+<td class="content">
+The recommendations and conclusions here are only approximations. Appropriate numbers
+depend on use case. There is no substitute for measurement and monitoring of resources used during a
+representative workload.
+</td>
+</tr>
+</table>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_terms"><a class="link" href="#_terms">Terms</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>We will use the following terms:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><strong>hot replica</strong>: A tablet replica that is continuously receiving writes. For example, in a time
+series use case, tablet replicas for the most recent range partition on a time column would be
+continuously receiving the latest data, and would be hot replicas.</p>
+</li>
+<li>
+<p><strong>cold replica</strong>: A tablet replica that is not hot, i.e. a replica that is not frequently receiving
+writes, for example, once every few minutes. A cold replica may be read from. For example, in a time
+series use case, tablet replicas for previous range partitions on a time column would not receive
+writes at all, or only occasionally receive late updates or additions, but may be constantly read.</p>
+</li>
+<li>
+<p><strong>data on disk</strong>: The total amount of data stored on a tablet server across all disks,
+post-replication, post-compression, and post-encoding.</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_example_workload"><a class="link" href="#_example_workload">Example Workload</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>The sections below perform sample calculations using the following parameters:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>200 hot replicas per tablet server</p>
+</li>
+<li>
+<p>1600 cold replicas per tablet server</p>
+</li>
+<li>
+<p>8TB of data on disk per tablet server (about 4.5GB/replica)</p>
+</li>
+<li>
+<p>512MB block cache</p>
+</li>
+<li>
+<p>40 cores per server</p>
+</li>
+<li>
+<p>limit of 32000 file descriptors per server</p>
+</li>
+<li>
+<p>a read workload with 1 frequently-scanned table with 40 columns</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>This workload resembles a time series use case, where the hot replicas correspond to the most recent
+range partition on time.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="memory"><a class="link" href="#memory">Memory</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>The flag <code>--memory_limit_hard_bytes</code> determines the maximum amount of memory that a Kudu tablet
+server may use. The amount of memory used by a tablet server scales with data size, write workload,
+and read concurrency. The following table provides numbers that can be used to compute a rough
+estimate of memory usage.</p>
+</div>
+<table class="tableblock frame-all grid-all spread">
+<caption class="title">Table 1. Tablet Server Memory Usage</caption>
+<colgroup>
+<col style="width: 33.3333%;">
+<col style="width: 33.3333%;">
+<col style="width: 33.3334%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">Type</th>
+<th class="tableblock halign-left valign-top">Multiplier</th>
+<th class="tableblock halign-left valign-top">Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Memory required per TB of data on disk</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1.5GB per 1TB data on disk</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Amount of memory per unit of data on disk required for
+basic operation of the tablet server.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Hot Replicas' MemRowSets and DeltaMemStores</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">minimum 128MB per hot replica</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Minimum amount of data
+to flush per MemRowSet flush. For most use cases, updates should be rare compared to inserts, so the
+DeltaMemStores should be very small.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Scans</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">256KB per column per core for read-heavy tables</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Amount of memory used by scanners, and which
+will be constantly needed for tables which are constantly read.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Block Cache</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Fixed by <code>--block_cache_capacity_mb</code> (default 512MB)</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Amount of memory reserved for use by the
+block cache.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>Using this information for the example load gives the following breakdown of memory usage:</p>
+</div>
+<table class="tableblock frame-all grid-all spread">
+<caption class="title">Table 2. Example Tablet Server Memory Usage</caption>
+<colgroup>
+<col style="width: 50%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">Type</th>
+<th class="tableblock halign-left valign-top">Amount</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">8TB data on disk</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">8TB * 1.5GB / 1TB = 12GB</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">200 hot replicas</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">200 * 128MB = 25.6GB</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1 40-column, frequently-scanned table</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">40 * 40 * 256KB = 409.6MB</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Block Cache</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>--block_cache_capacity_mb=512</code> = 512MB</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Expected memory usage</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">38.5GB</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Recommended hard limit</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">52GB</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>Using this as a rough estimate of Kudu&#8217;s memory usage, select a memory limit so that the expected
+memory usage of Kudu is around 50-75% of the hard limit.</p>
+</div>
+<div class="sect2">
+<h3 id="_verifying_if_a_memory_limit_is_sufficient"><a class="link" href="#_verifying_if_a_memory_limit_is_sufficient">Verifying if a Memory Limit is sufficient</a></h3>
+<div class="paragraph">
+<p>After configuring an appropriate memory limit with <code>--memory_limit_hard_bytes</code>, run a workload and
+monitor the Kudu tablet server process&#8217;s RAM usage. The memory usage should stay around 50-75% of
+the hard limit, with occasional spikes above 75% but below 100%. If the tablet server runs above 75%
+consistently, the memory limit should be increased.</p>
+</div>
+<div class="paragraph">
+<p>Additionally, it&#8217;s also useful to monitor the logs for memory rejections, which look like:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>Service unavailable: Soft memory limit exceeded (at 96.35% of capacity)</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>and watch the memory rejections metrics:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><code>leader_memory_pressure_rejections</code></p>
+</li>
+<li>
+<p><code>follower_memory_pressure_rejections</code></p>
+</li>
+<li>
+<p><code>transaction_memory_pressure_rejections</code></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>Occasional rejections due to memory pressure are fine and act as backpressure to clients. Clients
+will transparently retry operations. However, no operations should time out.</p>
+</div>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="file_descriptors"><a class="link" href="#file_descriptors">File Descriptors</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Processes are allotted a maximum number of open file descriptors (also referred to as fds). If a
+tablet server attempts to open too many fds, it will typically crash with a message saying something
+like "too many open files". The following table summarizes the sources of file descriptor usage in a
+Kudu tablet server process:</p>
+</div>
+<table class="tableblock frame-all grid-all spread">
+<caption class="title">Table 3. Tablet Server File Descriptor Usage</caption>
+<colgroup>
+<col style="width: 33.3333%;">
+<col style="width: 33.3333%;">
+<col style="width: 33.3334%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">Type</th>
+<th class="tableblock halign-left valign-top">Multiplier</th>
+<th class="tableblock halign-left valign-top">Description</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">File cache</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Fixed by <code>--block_manager_max_open_files</code> (default 40% of process maximum)</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Maximum allowed open fds reserved for use by
+the file cache.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Hot replicas</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">2 per WAL segment, 1 per WAL index</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Number of fds used by hot replicas. See below
+for more explanation.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Cold replicas</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">3 per cold replica</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Number of fds used per cold replica: 2 for the single WAL
+segment and 1 for the single WAL index.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>Every replica has at least one WAL segment and at least one WAL index, and should have the same
+number of segments and indices; however, the number of segments and indices can be greater for a
+replica if one of its peer replicas is falling behind. WAL segment and index fds are closed as WALs
+are garbage collected.</p>
+</div>
+<div class="paragraph">
+<p>Using this information for the example load gives the following breakdown of file descriptor usage,
+under the assumption that some replicas are lagging and using 10 WAL segments:</p>
+</div>
+<table class="tableblock frame-all grid-all spread">
+<caption class="title">Table 4. Example Tablet Server File Descriptor Usage</caption>
+<colgroup>
+<col style="width: 50%;">
+<col style="width: 50%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top">Type</th>
+<th class="tableblock halign-left valign-top">Amount</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">file cache</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">40% * 32000 fds = 12800 fds</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1600 cold replicas</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1600 cold replicas * 3 fds / cold replica = 4800 fds</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">200 hot replicas</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">(2 / segment * 10 segments/hot replica * 200 hot replicas) + (1 / index * 10 indices / hot replica * 200 hot replicas) = 6000 fds</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Total</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">23600 fds</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>So for this example, the tablet server process has about 32000 - 23600 = 8400 fds to spare.</p>
+</div>
+<div class="paragraph">
+<p>There is typically no downside to configuring a higher file descriptor limit if approaching the
+currently configured limit.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="threads"><a class="link" href="#threads">Threads</a></h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>Processes are allotted a maximum number of threads by the operating system, and this limit is
+typically difficult or impossible to change. Therefore, this section is more informational than
+advisory.</p>
+</div>
+<div class="paragraph">
+<p>If a Kudu tablet server&#8217;s thread count exceeds the OS limit, it will crash, usually with a message
+in the logs like "pthread_create failed: Resource temporarily unavailable". If the system thread
+count limit is exceeded, other processes on the same node may also crash.</p>
+</div>
+<div class="paragraph">
+<p>Threads and threadpools are used all over Kudu for various purposes, but the number of threads found
+in nearly all of these does not scale with load or data/tablet size; instead, the number of threads
+is either a hardcoded constant, a constant defined by a configuration parameter, or based on a
+static dimension (such as the number of CPU cores).</p>
+</div>
+<div class="paragraph">
+<p>The only exception to this is the WAL append thread, one of which exists for every "hot" replica.
+Note that all replicas may be considered hot at startup, so tablet servers' thread usage will
+generally peak when started and settle down thereafter.</p>
+</div>
+</div>
+</div>
+    </div>
+    <div class="col-md-3">
+
+  <div id="toc" data-spy="affix" data-offset-top="70">
+  <ul>
+
+      <li>
+
+          <a href="index.html">Introducing Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="release_notes.html">Kudu Release Notes</a> 
+      </li> 
+      <li>
+
+          <a href="quickstart.html">Getting Started with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="installation.html">Installation Guide</a> 
+      </li> 
+      <li>
+
+          <a href="configuration.html">Configuring Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="kudu_impala_integration.html">Using Impala with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="administration.html">Administering Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="troubleshooting.html">Troubleshooting Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="developing.html">Developing Applications with Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="schema_design.html">Kudu Schema Design</a> 
+      </li> 
+      <li>
+<span class="active-toc">Kudu Scaling Guide</span>
+            <ul class="sectlevel1">
+<li><a href="#_terms">Terms</a></li>
+<li><a href="#_example_workload">Example Workload</a></li>
+<li><a href="#memory">Memory</a>
+<ul class="sectlevel2">
+<li><a href="#_verifying_if_a_memory_limit_is_sufficient">Verifying if a Memory Limit is sufficient</a></li>
+</ul>
+</li>
+<li><a href="#file_descriptors">File Descriptors</a></li>
+<li><a href="#threads">Threads</a></li>
+</ul> 
+      </li> 
+      <li>
+
+          <a href="security.html">Kudu Security</a> 
+      </li> 
+      <li>
+
+          <a href="transaction_semantics.html">Kudu Transaction Semantics</a> 
+      </li> 
+      <li>
+
+          <a href="background_tasks.html">Background Maintenance Tasks</a> 
+      </li> 
+      <li>
+
+          <a href="configuration_reference.html">Kudu Configuration Reference</a> 
+      </li> 
+      <li>
+
+          <a href="command_line_tools_reference.html">Kudu Command Line Tools Reference</a> 
+      </li> 
+      <li>
+
+          <a href="known_issues.html">Known Issues and Limitations</a> 
+      </li> 
+      <li>
+
+          <a href="contributing.html">Contributing to Kudu</a> 
+      </li> 
+      <li>
+
+          <a href="export_control.html">Export Control Notice</a> 
+      </li> 
+  </ul>
+  </div>
+    </div>
+  </div>
+</div>
\ No newline at end of file