You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by yh...@apache.org on 2016/12/28 22:35:12 UTC

[01/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Repository: spark-website
Updated Branches:
  refs/heads/asf-site ecf94f284 -> d2bcf1854


http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/submitting-applications.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/submitting-applications.html b/site/docs/2.1.0/submitting-applications.html
index fc18fa9..0c91739 100644
--- a/site/docs/2.1.0/submitting-applications.html
+++ b/site/docs/2.1.0/submitting-applications.html
@@ -151,14 +151,14 @@ packaging them into a <code>.zip</code> or <code>.egg</code>.</p>
 This script takes care of setting up the classpath with Spark and its
 dependencies, and can support different cluster managers and deploy modes that Spark supports:</p>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash">./bin/spark-submit <span class="se">\</span>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>./bin/spark-submit <span class="se">\</span>
   --class &lt;main-class&gt; <span class="se">\</span>
   --master &lt;master-url&gt; <span class="se">\</span>
   --deploy-mode &lt;deploy-mode&gt; <span class="se">\</span>
   --conf &lt;key&gt;<span class="o">=</span>&lt;value&gt; <span class="se">\</span>
-  ... <span class="c"># other options</span>
+  ... <span class="c1"># other options</span>
   &lt;application-jar&gt; <span class="se">\</span>
-  <span class="o">[</span>application-arguments<span class="o">]</span></code></pre></div>
+  <span class="o">[</span>application-arguments<span class="o">]</span></code></pre></figure>
 
 <p>Some of the commonly used options are:</p>
 
@@ -194,23 +194,23 @@ you can also specify <code>--supervise</code> to make sure that the driver is au
 fails with non-zero exit code. To enumerate all such options available to <code>spark-submit</code>,
 run it with <code>--help</code>. Here are a few examples of common options:</p>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Run application locally on 8 cores</span>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="c1"># Run application locally on 8 cores</span>
 ./bin/spark-submit <span class="se">\</span>
   --class org.apache.spark.examples.SparkPi <span class="se">\</span>
-  --master <span class="nb">local</span><span class="o">[</span>8<span class="o">]</span> <span class="se">\</span>
+  --master local<span class="o">[</span><span class="m">8</span><span class="o">]</span> <span class="se">\</span>
   /path/to/examples.jar <span class="se">\</span>
-  100
+  <span class="m">100</span>
 
-<span class="c"># Run on a Spark standalone cluster in client deploy mode</span>
+<span class="c1"># Run on a Spark standalone cluster in client deploy mode</span>
 ./bin/spark-submit <span class="se">\</span>
   --class org.apache.spark.examples.SparkPi <span class="se">\</span>
   --master spark://207.184.161.138:7077 <span class="se">\</span>
   --executor-memory 20G <span class="se">\</span>
   --total-executor-cores <span class="m">100</span> <span class="se">\</span>
   /path/to/examples.jar <span class="se">\</span>
-  1000
+  <span class="m">1000</span>
 
-<span class="c"># Run on a Spark standalone cluster in cluster deploy mode with supervise</span>
+<span class="c1"># Run on a Spark standalone cluster in cluster deploy mode with supervise</span>
 ./bin/spark-submit <span class="se">\</span>
   --class org.apache.spark.examples.SparkPi <span class="se">\</span>
   --master spark://207.184.161.138:7077 <span class="se">\</span>
@@ -219,26 +219,26 @@ run it with <code>--help</code>. Here are a few examples of common options:</p>
   --executor-memory 20G <span class="se">\</span>
   --total-executor-cores <span class="m">100</span> <span class="se">\</span>
   /path/to/examples.jar <span class="se">\</span>
-  1000
+  <span class="m">1000</span>
 
-<span class="c"># Run on a YARN cluster</span>
-<span class="nb">export </span><span class="nv">HADOOP_CONF_DIR</span><span class="o">=</span>XXX
+<span class="c1"># Run on a YARN cluster</span>
+<span class="nb">export</span> <span class="nv">HADOOP_CONF_DIR</span><span class="o">=</span>XXX
 ./bin/spark-submit <span class="se">\</span>
   --class org.apache.spark.examples.SparkPi <span class="se">\</span>
   --master yarn <span class="se">\</span>
-  --deploy-mode cluster <span class="se">\ </span> <span class="c"># can be client for client mode</span>
+  --deploy-mode cluster <span class="se">\ </span> <span class="c1"># can be client for client mode</span>
   --executor-memory 20G <span class="se">\</span>
   --num-executors <span class="m">50</span> <span class="se">\</span>
   /path/to/examples.jar <span class="se">\</span>
-  1000
+  <span class="m">1000</span>
 
-<span class="c"># Run a Python application on a Spark standalone cluster</span>
+<span class="c1"># Run a Python application on a Spark standalone cluster</span>
 ./bin/spark-submit <span class="se">\</span>
   --master spark://207.184.161.138:7077 <span class="se">\</span>
   examples/src/main/python/pi.py <span class="se">\</span>
-  1000
+  <span class="m">1000</span>
 
-<span class="c"># Run on a Mesos cluster in cluster deploy mode with supervise</span>
+<span class="c1"># Run on a Mesos cluster in cluster deploy mode with supervise</span>
 ./bin/spark-submit <span class="se">\</span>
   --class org.apache.spark.examples.SparkPi <span class="se">\</span>
   --master mesos://207.184.161.138:7077 <span class="se">\</span>
@@ -247,7 +247,7 @@ run it with <code>--help</code>. Here are a few examples of common options:</p>
   --executor-memory 20G <span class="se">\</span>
   --total-executor-cores <span class="m">100</span> <span class="se">\</span>
   http://path/to/examples.jar <span class="se">\</span>
-  1000</code></pre></div>
+  <span class="m">1000</span></code></pre></figure>
 
 <h1 id="master-urls">Master URLs</h1>
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/tuning.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/tuning.html b/site/docs/2.1.0/tuning.html
index ca4ad9f..33a6316 100644
--- a/site/docs/2.1.0/tuning.html
+++ b/site/docs/2.1.0/tuning.html
@@ -129,23 +129,23 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#data-serialization" id="markdown-toc-data-serialization">Data Serialization</a></li>
-  <li><a href="#memory-tuning" id="markdown-toc-memory-tuning">Memory Tuning</a>    <ul>
-      <li><a href="#memory-management-overview" id="markdown-toc-memory-management-overview">Memory Management Overview</a></li>
-      <li><a href="#determining-memory-consumption" id="markdown-toc-determining-memory-consumption">Determining Memory Consumption</a></li>
-      <li><a href="#tuning-data-structures" id="markdown-toc-tuning-data-structures">Tuning Data Structures</a></li>
-      <li><a href="#serialized-rdd-storage" id="markdown-toc-serialized-rdd-storage">Serialized RDD Storage</a></li>
-      <li><a href="#garbage-collection-tuning" id="markdown-toc-garbage-collection-tuning">Garbage Collection Tuning</a></li>
+  <li><a href="#data-serialization">Data Serialization</a></li>
+  <li><a href="#memory-tuning">Memory Tuning</a>    <ul>
+      <li><a href="#memory-management-overview">Memory Management Overview</a></li>
+      <li><a href="#determining-memory-consumption">Determining Memory Consumption</a></li>
+      <li><a href="#tuning-data-structures">Tuning Data Structures</a></li>
+      <li><a href="#serialized-rdd-storage">Serialized RDD Storage</a></li>
+      <li><a href="#garbage-collection-tuning">Garbage Collection Tuning</a></li>
     </ul>
   </li>
-  <li><a href="#other-considerations" id="markdown-toc-other-considerations">Other Considerations</a>    <ul>
-      <li><a href="#level-of-parallelism" id="markdown-toc-level-of-parallelism">Level of Parallelism</a></li>
-      <li><a href="#memory-usage-of-reduce-tasks" id="markdown-toc-memory-usage-of-reduce-tasks">Memory Usage of Reduce Tasks</a></li>
-      <li><a href="#broadcasting-large-variables" id="markdown-toc-broadcasting-large-variables">Broadcasting Large Variables</a></li>
-      <li><a href="#data-locality" id="markdown-toc-data-locality">Data Locality</a></li>
+  <li><a href="#other-considerations">Other Considerations</a>    <ul>
+      <li><a href="#level-of-parallelism">Level of Parallelism</a></li>
+      <li><a href="#memory-usage-of-reduce-tasks">Memory Usage of Reduce Tasks</a></li>
+      <li><a href="#broadcasting-large-variables">Broadcasting Large Variables</a></li>
+      <li><a href="#data-locality">Data Locality</a></li>
     </ul>
   </li>
-  <li><a href="#summary" id="markdown-toc-summary">Summary</a></li>
+  <li><a href="#summary">Summary</a></li>
 </ul>
 
 <p>Because of the in-memory nature of most Spark computations, Spark programs can be bottlenecked
@@ -194,9 +194,9 @@ in the AllScalaRegistrar from the <a href="https://github.com/twitter/chill">Twi
 
 <p>To register your own custom classes with Kryo, use the <code>registerKryoClasses</code> method.</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">().</span><span class="n">setMaster</span><span class="o">(...).</span><span class="n">setAppName</span><span class="o">(...)</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">().</span><span class="n">setMaster</span><span class="o">(...).</span><span class="n">setAppName</span><span class="o">(...)</span>
 <span class="n">conf</span><span class="o">.</span><span class="n">registerKryoClasses</span><span class="o">(</span><span class="nc">Array</span><span class="o">(</span><span class="n">classOf</span><span class="o">[</span><span class="kt">MyClass1</span><span class="o">],</span> <span class="n">classOf</span><span class="o">[</span><span class="kt">MyClass2</span><span class="o">]))</span>
-<span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></figure>
 
 <p>The <a href="https://github.com/EsotericSoftware/kryo">Kryo documentation</a> describes more advanced
 registration options, such as adding custom serialization code.</p>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[24/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/graphx-programming-guide.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/graphx-programming-guide.html b/site/docs/2.1.0/graphx-programming-guide.html
index 780d1ab..08b3380 100644
--- a/site/docs/2.1.0/graphx-programming-guide.html
+++ b/site/docs/2.1.0/graphx-programming-guide.html
@@ -129,42 +129,42 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#overview" id="markdown-toc-overview">Overview</a></li>
-  <li><a href="#getting-started" id="markdown-toc-getting-started">Getting Started</a></li>
-  <li><a href="#the-property-graph" id="markdown-toc-the-property-graph">The Property Graph</a>    <ul>
-      <li><a href="#example-property-graph" id="markdown-toc-example-property-graph">Example Property Graph</a></li>
+  <li><a href="#overview">Overview</a></li>
+  <li><a href="#getting-started">Getting Started</a></li>
+  <li><a href="#the-property-graph">The Property Graph</a>    <ul>
+      <li><a href="#example-property-graph">Example Property Graph</a></li>
     </ul>
   </li>
-  <li><a href="#graph-operators" id="markdown-toc-graph-operators">Graph Operators</a>    <ul>
-      <li><a href="#summary-list-of-operators" id="markdown-toc-summary-list-of-operators">Summary List of Operators</a></li>
-      <li><a href="#property-operators" id="markdown-toc-property-operators">Property Operators</a></li>
-      <li><a href="#structural-operators" id="markdown-toc-structural-operators">Structural Operators</a></li>
-      <li><a href="#join-operators" id="markdown-toc-join-operators">Join Operators</a></li>
-      <li><a href="#neighborhood-aggregation" id="markdown-toc-neighborhood-aggregation">Neighborhood Aggregation</a>        <ul>
-          <li><a href="#aggregate-messages-aggregatemessages" id="markdown-toc-aggregate-messages-aggregatemessages">Aggregate Messages (aggregateMessages)</a></li>
-          <li><a href="#map-reduce-triplets-transition-guide-legacy" id="markdown-toc-map-reduce-triplets-transition-guide-legacy">Map Reduce Triplets Transition Guide (Legacy)</a></li>
-          <li><a href="#computing-degree-information" id="markdown-toc-computing-degree-information">Computing Degree Information</a></li>
-          <li><a href="#collecting-neighbors" id="markdown-toc-collecting-neighbors">Collecting Neighbors</a></li>
+  <li><a href="#graph-operators">Graph Operators</a>    <ul>
+      <li><a href="#summary-list-of-operators">Summary List of Operators</a></li>
+      <li><a href="#property-operators">Property Operators</a></li>
+      <li><a href="#structural-operators">Structural Operators</a></li>
+      <li><a href="#join-operators">Join Operators</a></li>
+      <li><a href="#neighborhood-aggregation">Neighborhood Aggregation</a>        <ul>
+          <li><a href="#aggregate-messages-aggregatemessages">Aggregate Messages (aggregateMessages)</a></li>
+          <li><a href="#map-reduce-triplets-transition-guide-legacy">Map Reduce Triplets Transition Guide (Legacy)</a></li>
+          <li><a href="#computing-degree-information">Computing Degree Information</a></li>
+          <li><a href="#collecting-neighbors">Collecting Neighbors</a></li>
         </ul>
       </li>
-      <li><a href="#caching-and-uncaching" id="markdown-toc-caching-and-uncaching">Caching and Uncaching</a></li>
+      <li><a href="#caching-and-uncaching">Caching and Uncaching</a></li>
     </ul>
   </li>
-  <li><a href="#pregel-api" id="markdown-toc-pregel-api">Pregel API</a></li>
-  <li><a href="#graph-builders" id="markdown-toc-graph-builders">Graph Builders</a></li>
-  <li><a href="#vertex-and-edge-rdds" id="markdown-toc-vertex-and-edge-rdds">Vertex and Edge RDDs</a>    <ul>
-      <li><a href="#vertexrdds" id="markdown-toc-vertexrdds">VertexRDDs</a></li>
-      <li><a href="#edgerdds" id="markdown-toc-edgerdds">EdgeRDDs</a></li>
+  <li><a href="#pregel-api">Pregel API</a></li>
+  <li><a href="#graph-builders">Graph Builders</a></li>
+  <li><a href="#vertex-and-edge-rdds">Vertex and Edge RDDs</a>    <ul>
+      <li><a href="#vertexrdds">VertexRDDs</a></li>
+      <li><a href="#edgerdds">EdgeRDDs</a></li>
     </ul>
   </li>
-  <li><a href="#optimized-representation" id="markdown-toc-optimized-representation">Optimized Representation</a></li>
-  <li><a href="#graph-algorithms" id="markdown-toc-graph-algorithms">Graph Algorithms</a>    <ul>
-      <li><a href="#pagerank" id="markdown-toc-pagerank">PageRank</a></li>
-      <li><a href="#connected-components" id="markdown-toc-connected-components">Connected Components</a></li>
-      <li><a href="#triangle-counting" id="markdown-toc-triangle-counting">Triangle Counting</a></li>
+  <li><a href="#optimized-representation">Optimized Representation</a></li>
+  <li><a href="#graph-algorithms">Graph Algorithms</a>    <ul>
+      <li><a href="#pagerank">PageRank</a></li>
+      <li><a href="#connected-components">Connected Components</a></li>
+      <li><a href="#triangle-counting">Triangle Counting</a></li>
     </ul>
   </li>
-  <li><a href="#examples" id="markdown-toc-examples">Examples</a></li>
+  <li><a href="#examples">Examples</a></li>
 </ul>
 
 <!-- All the documentation links  -->
@@ -188,10 +188,10 @@ operators (e.g., <a href="#structural_operators">subgraph</a>, <a href="#join_op
 
 <p>To get started you first need to import Spark and GraphX into your project, as follows:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark._</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark._</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.graphx._</span>
 <span class="c1">// To make some of the examples work we will also need RDD</span>
-<span class="k">import</span> <span class="nn">org.apache.spark.rdd.RDD</span></code></pre></div>
+<span class="k">import</span> <span class="nn">org.apache.spark.rdd.RDD</span></code></pre></figure>
 
 <p>If you are not using the Spark shell you will also need a <code>SparkContext</code>.  To learn more about
 getting started with Spark refer to the <a href="quick-start.html">Spark Quick Start Guide</a>.</p>
@@ -222,11 +222,11 @@ arrays.</p>
 This can be accomplished through inheritance.  For example to model users and products as a
 bipartite graph we might do the following:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">VertexProperty</span><span class="o">()</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">class</span> <span class="nc">VertexProperty</span><span class="o">()</span>
 <span class="k">case</span> <span class="k">class</span> <span class="nc">UserProperty</span><span class="o">(</span><span class="k">val</span> <span class="n">name</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span> <span class="k">extends</span> <span class="nc">VertexProperty</span>
 <span class="k">case</span> <span class="k">class</span> <span class="nc">ProductProperty</span><span class="o">(</span><span class="k">val</span> <span class="n">name</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="k">val</span> <span class="n">price</span><span class="k">:</span> <span class="kt">Double</span><span class="o">)</span> <span class="k">extends</span> <span class="nc">VertexProperty</span>
 <span class="c1">// The graph might then have the type:</span>
-<span class="k">var</span> <span class="n">graph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">VertexProperty</span>, <span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="kc">null</span></code></pre></div>
+<span class="k">var</span> <span class="n">graph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">VertexProperty</span>, <span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="kc">null</span></code></pre></figure>
 
 <p>Like RDDs, property graphs are immutable, distributed, and fault-tolerant.  Changes to the values or
 structure of the graph are accomplished by producing a new graph with the desired changes.  Note
@@ -239,10 +239,10 @@ RDDs, each partition of the graph can be recreated on a different machine in the
 properties for each vertex and edge.  As a consequence, the graph class contains members to access
 the vertices and edges of the graph:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">class</span> <span class="nc">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
   <span class="k">val</span> <span class="n">vertices</span><span class="k">:</span> <span class="kt">VertexRDD</span><span class="o">[</span><span class="kt">VD</span><span class="o">]</span>
   <span class="k">val</span> <span class="n">edges</span><span class="k">:</span> <span class="kt">EdgeRDD</span><span class="o">[</span><span class="kt">ED</span><span class="o">]</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
 <p>The classes <code>VertexRDD[VD]</code> and <code>EdgeRDD[ED]</code> extend and are optimized versions of <code>RDD[(VertexId,
 VD)]</code> and <code>RDD[Edge[ED]]</code> respectively.  Both <code>VertexRDD[VD]</code> and <code>EdgeRDD[ED]</code> provide  additional
@@ -264,7 +264,7 @@ with a string describing the relationships between collaborators:</p>
 
 <p>The resulting graph would have the type signature:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">userGraph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[(</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">)</span>, <span class="kt">String</span><span class="o">]</span></code></pre></div>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">userGraph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[(</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">)</span>, <span class="kt">String</span><span class="o">]</span></code></pre></figure>
 
 <p>There are numerous ways to construct a property graph from raw files, RDDs, and even synthetic
 generators and these are discussed in more detail in the section on
@@ -272,7 +272,7 @@ generators and these are discussed in more detail in the section on
 <a href="api/scala/index.html#org.apache.spark.graphx.Graph$">Graph object</a>.  For example the following
 code constructs a graph from a collection of RDDs:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Assume the SparkContext has already been constructed</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// Assume the SparkContext has already been constructed</span>
 <span class="k">val</span> <span class="n">sc</span><span class="k">:</span> <span class="kt">SparkContext</span>
 <span class="c1">// Create an RDD for the vertices</span>
 <span class="k">val</span> <span class="n">users</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[(</span><span class="kt">VertexId</span>, <span class="o">(</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">))]</span> <span class="k">=</span>
@@ -285,7 +285,7 @@ code constructs a graph from a collection of RDDs:</p>
 <span class="c1">// Define a default user in case there are relationship with missing user</span>
 <span class="k">val</span> <span class="n">defaultUser</span> <span class="k">=</span> <span class="o">(</span><span class="s">&quot;John Doe&quot;</span><span class="o">,</span> <span class="s">&quot;Missing&quot;</span><span class="o">)</span>
 <span class="c1">// Build the initial Graph</span>
-<span class="k">val</span> <span class="n">graph</span> <span class="k">=</span> <span class="nc">Graph</span><span class="o">(</span><span class="n">users</span><span class="o">,</span> <span class="n">relationships</span><span class="o">,</span> <span class="n">defaultUser</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">graph</span> <span class="k">=</span> <span class="nc">Graph</span><span class="o">(</span><span class="n">users</span><span class="o">,</span> <span class="n">relationships</span><span class="o">,</span> <span class="n">defaultUser</span><span class="o">)</span></code></pre></figure>
 
 <p>In the above example we make use of the <a href="api/scala/index.html#org.apache.spark.graphx.Edge"><code>Edge</code></a> case class. Edges have a <code>srcId</code> and a
 <code>dstId</code> corresponding to the source and destination vertex identifiers. In addition, the <code>Edge</code>
@@ -294,11 +294,11 @@ class has an <code>attr</code> member which stores the edge property.</p>
 <p>We can deconstruct a graph into the respective vertex and edge views by using the <code>graph.vertices</code>
 and <code>graph.edges</code> members respectively.</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">graph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[(</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">)</span>, <span class="kt">String</span><span class="o">]</span> <span class="c1">// Constructed from above</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">graph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[(</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">)</span>, <span class="kt">String</span><span class="o">]</span> <span class="c1">// Constructed from above</span>
 <span class="c1">// Count all users which are postdocs</span>
 <span class="n">graph</span><span class="o">.</span><span class="n">vertices</span><span class="o">.</span><span class="n">filter</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="o">(</span><span class="n">name</span><span class="o">,</span> <span class="n">pos</span><span class="o">))</span> <span class="k">=&gt;</span> <span class="n">pos</span> <span class="o">==</span> <span class="s">&quot;postdoc&quot;</span> <span class="o">}.</span><span class="n">count</span>
 <span class="c1">// Count all the edges where src &gt; dst</span>
-<span class="n">graph</span><span class="o">.</span><span class="n">edges</span><span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="n">e</span> <span class="k">=&gt;</span> <span class="n">e</span><span class="o">.</span><span class="n">srcId</span> <span class="o">&gt;</span> <span class="n">e</span><span class="o">.</span><span class="n">dstId</span><span class="o">).</span><span class="n">count</span></code></pre></div>
+<span class="n">graph</span><span class="o">.</span><span class="n">edges</span><span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="n">e</span> <span class="k">=&gt;</span> <span class="n">e</span><span class="o">.</span><span class="n">srcId</span> <span class="o">&gt;</span> <span class="n">e</span><span class="o">.</span><span class="n">dstId</span><span class="o">).</span><span class="n">count</span></code></pre></figure>
 
 <blockquote>
   <p>Note that <code>graph.vertices</code> returns an <code>VertexRDD[(String, String)]</code> which extends
@@ -306,17 +306,17 @@ and <code>graph.edges</code> members respectively.</p>
 tuple.  On the other hand, <code>graph.edges</code> returns an <code>EdgeRDD</code> containing <code>Edge[String]</code> objects.
 We could have also used the case class type constructor as in the following:</p>
 
+  <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">graph</span><span class="o">.</span><span class="n">edges</span><span class="o">.</span><span class="n">filter</span> <span class="o">{</span> <span class="k">case</span> <span class="nc">Edge</span><span class="o">(</span><span class="n">src</span><span class="o">,</span> <span class="n">dst</span><span class="o">,</span> <span class="n">prop</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">src</span> <span class="o">&gt;</span> <span class="n">dst</span> <span class="o">}.</span><span class="n">count</span></code></pre></figure>
 </blockquote>
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">graph</span><span class="o">.</span><span class="n">edges</span><span class="o">.</span><span class="n">filter</span> <span class="o">{</span> <span class="k">case</span> <span class="nc">Edge</span><span class="o">(</span><span class="n">src</span><span class="o">,</span> <span class="n">dst</span><span class="o">,</span> <span class="n">prop</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">src</span> <span class="o">&gt;</span> <span class="n">dst</span> <span class="o">}.</span><span class="n">count</span></code></pre></div>
 
 <p>In addition to the vertex and edge views of the property graph, GraphX also exposes a triplet view.
 The triplet view logically joins the vertex and edge properties yielding an
 <code>RDD[EdgeTriplet[VD, ED]]</code> containing instances of the <a href="api/scala/index.html#org.apache.spark.graphx.EdgeTriplet"><code>EdgeTriplet</code></a> class. This
 <em>join</em> can be expressed in the following SQL expression:</p>
 
-<div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">SELECT</span> <span class="n">src</span><span class="p">.</span><span class="n">id</span><span class="p">,</span> <span class="n">dst</span><span class="p">.</span><span class="n">id</span><span class="p">,</span> <span class="n">src</span><span class="p">.</span><span class="n">attr</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">attr</span><span class="p">,</span> <span class="n">dst</span><span class="p">.</span><span class="n">attr</span>
+<figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="k">SELECT</span> <span class="n">src</span><span class="p">.</span><span class="n">id</span><span class="p">,</span> <span class="n">dst</span><span class="p">.</span><span class="n">id</span><span class="p">,</span> <span class="n">src</span><span class="p">.</span><span class="n">attr</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">attr</span><span class="p">,</span> <span class="n">dst</span><span class="p">.</span><span class="n">attr</span>
 <span class="k">FROM</span> <span class="n">edges</span> <span class="k">AS</span> <span class="n">e</span> <span class="k">LEFT</span> <span class="k">JOIN</span> <span class="n">vertices</span> <span class="k">AS</span> <span class="n">src</span><span class="p">,</span> <span class="n">vertices</span> <span class="k">AS</span> <span class="n">dst</span>
-<span class="k">ON</span> <span class="n">e</span><span class="p">.</span><span class="n">srcId</span> <span class="o">=</span> <span class="n">src</span><span class="p">.</span><span class="n">Id</span> <span class="k">AND</span> <span class="n">e</span><span class="p">.</span><span class="n">dstId</span> <span class="o">=</span> <span class="n">dst</span><span class="p">.</span><span class="n">Id</span></code></pre></div>
+<span class="k">ON</span> <span class="n">e</span><span class="p">.</span><span class="n">srcId</span> <span class="o">=</span> <span class="n">src</span><span class="p">.</span><span class="n">Id</span> <span class="k">AND</span> <span class="n">e</span><span class="p">.</span><span class="n">dstId</span> <span class="o">=</span> <span class="n">dst</span><span class="p">.</span><span class="n">Id</span></code></pre></figure>
 
 <p>or graphically as:</p>
 
@@ -329,12 +329,12 @@ The triplet view logically joins the vertex and edge properties yielding an
 <code>dstAttr</code> members which contain the source and destination properties respectively. We can use the
 triplet view of a graph to render a collection of strings describing relationships between users.</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">graph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[(</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">)</span>, <span class="kt">String</span><span class="o">]</span> <span class="c1">// Constructed from above</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">graph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[(</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">)</span>, <span class="kt">String</span><span class="o">]</span> <span class="c1">// Constructed from above</span>
 <span class="c1">// Use the triplets view to create an RDD of facts.</span>
 <span class="k">val</span> <span class="n">facts</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span> <span class="k">=</span>
   <span class="n">graph</span><span class="o">.</span><span class="n">triplets</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">triplet</span> <span class="k">=&gt;</span>
     <span class="n">triplet</span><span class="o">.</span><span class="n">srcAttr</span><span class="o">.</span><span class="n">_1</span> <span class="o">+</span> <span class="s">&quot; is the &quot;</span> <span class="o">+</span> <span class="n">triplet</span><span class="o">.</span><span class="n">attr</span> <span class="o">+</span> <span class="s">&quot; of &quot;</span> <span class="o">+</span> <span class="n">triplet</span><span class="o">.</span><span class="n">dstAttr</span><span class="o">.</span><span class="n">_1</span><span class="o">)</span>
-<span class="n">facts</span><span class="o">.</span><span class="n">collect</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">println</span><span class="o">(</span><span class="k">_</span><span class="o">))</span></code></pre></div>
+<span class="n">facts</span><span class="o">.</span><span class="n">collect</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">println</span><span class="o">(</span><span class="k">_</span><span class="o">))</span></code></pre></figure>
 
 <h1 id="graph-operators">Graph Operators</h1>
 
@@ -346,9 +346,9 @@ core operators are defined in <a href="api/scala/index.html#org.apache.spark.gra
 operators in <code>GraphOps</code> are automatically available as members of <code>Graph</code>.  For example, we can
 compute the in-degree of each vertex (defined in <code>GraphOps</code>) by the following:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">graph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[(</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">)</span>, <span class="kt">String</span><span class="o">]</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">graph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[(</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">)</span>, <span class="kt">String</span><span class="o">]</span>
 <span class="c1">// Use the implicit GraphOps.inDegrees operator</span>
-<span class="k">val</span> <span class="n">inDegrees</span><span class="k">:</span> <span class="kt">VertexRDD</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">inDegrees</span></code></pre></div>
+<span class="k">val</span> <span class="n">inDegrees</span><span class="k">:</span> <span class="kt">VertexRDD</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">inDegrees</span></code></pre></figure>
 
 <p>The reason for differentiating between core graph operations and <a href="api/scala/index.html#org.apache.spark.graphx.GraphOps"><code>GraphOps</code></a> is to be
 able to support different graph representations in the future.  Each graph representation must
@@ -362,7 +362,7 @@ signatures have been simplified (e.g., default arguments and type constraints re
 advanced functionality has been removed so please consult the API docs for the official list of
 operations.</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="cm">/** Summary of the functionality in the property graph */</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="cm">/** Summary of the functionality in the property graph */</span>
 <span class="k">class</span> <span class="nc">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
   <span class="c1">// Information about the Graph ===================================================================</span>
   <span class="k">val</span> <span class="n">numEdges</span><span class="k">:</span> <span class="kt">Long</span>
@@ -419,17 +419,17 @@ operations.</p>
   <span class="k">def</span> <span class="n">connectedComponents</span><span class="o">()</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">VertexId</span>, <span class="kt">ED</span><span class="o">]</span>
   <span class="k">def</span> <span class="n">triangleCount</span><span class="o">()</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">Int</span>, <span class="kt">ED</span><span class="o">]</span>
   <span class="k">def</span> <span class="n">stronglyConnectedComponents</span><span class="o">(</span><span class="n">numIter</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">VertexId</span>, <span class="kt">ED</span><span class="o">]</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
 <h2 id="property-operators">Property Operators</h2>
 
 <p>Like the RDD <code>map</code> operator, the property graph contains the following:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">class</span> <span class="nc">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
   <span class="k">def</span> <span class="n">mapVertices</span><span class="o">[</span><span class="kt">VD2</span><span class="o">](</span><span class="n">map</span><span class="k">:</span> <span class="o">(</span><span class="kt">VertexId</span><span class="o">,</span> <span class="kt">VD</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="nc">VD2</span><span class="o">)</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">VD2</span>, <span class="kt">ED</span><span class="o">]</span>
   <span class="k">def</span> <span class="n">mapEdges</span><span class="o">[</span><span class="kt">ED2</span><span class="o">](</span><span class="n">map</span><span class="k">:</span> <span class="kt">Edge</span><span class="o">[</span><span class="kt">ED</span><span class="o">]</span> <span class="k">=&gt;</span> <span class="nc">ED2</span><span class="o">)</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED2</span><span class="o">]</span>
   <span class="k">def</span> <span class="n">mapTriplets</span><span class="o">[</span><span class="kt">ED2</span><span class="o">](</span><span class="n">map</span><span class="k">:</span> <span class="kt">EdgeTriplet</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="k">=&gt;</span> <span class="nc">ED2</span><span class="o">)</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED2</span><span class="o">]</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
 <p>Each of these operators yields a new graph with the vertex or edge properties modified by the user
 defined <code>map</code> function.</p>
@@ -440,27 +440,27 @@ which allows the resulting graph to reuse the structural indices of the original
 following snippets are logically equivalent, but the first one does not preserve the structural
 indices and would not benefit from the GraphX system optimizations:</p>
 
+  <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">newVertices</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">vertices</span><span class="o">.</span><span class="n">map</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">attr</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">mapUdf</span><span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">attr</span><span class="o">))</span> <span class="o">}</span>
+<span class="k">val</span> <span class="n">newGraph</span> <span class="k">=</span> <span class="nc">Graph</span><span class="o">(</span><span class="n">newVertices</span><span class="o">,</span> <span class="n">graph</span><span class="o">.</span><span class="n">edges</span><span class="o">)</span></code></pre></figure>
 </blockquote>
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">newVertices</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">vertices</span><span class="o">.</span><span class="n">map</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">attr</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">mapUdf</span><span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">attr</span><span class="o">))</span> <span class="o">}</span>
-<span class="k">val</span> <span class="n">newGraph</span> <span class="k">=</span> <span class="nc">Graph</span><span class="o">(</span><span class="n">newVertices</span><span class="o">,</span> <span class="n">graph</span><span class="o">.</span><span class="n">edges</span><span class="o">)</span></code></pre></div>
 
 <blockquote>
   <p>Instead, use <a href="api/scala/index.html#org.apache.spark.graphx.Graph@mapVertices[VD2]((VertexId,VD)\u21d2VD2)(ClassTag[VD2]):Graph[VD2,ED]"><code>mapVertices</code></a> to preserve the indices:</p>
 
+  <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">newGraph</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">mapVertices</span><span class="o">((</span><span class="n">id</span><span class="o">,</span> <span class="n">attr</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">mapUdf</span><span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">attr</span><span class="o">))</span></code></pre></figure>
 </blockquote>
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">newGraph</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">mapVertices</span><span class="o">((</span><span class="n">id</span><span class="o">,</span> <span class="n">attr</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">mapUdf</span><span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">attr</span><span class="o">))</span></code></pre></div>
 
 <p>These operators are often used to initialize the graph for a particular computation or project away
 unnecessary properties.  For example, given a graph with the out degrees as the vertex properties
 (we describe how to construct such a graph later), we initialize it for PageRank:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Given a graph where the vertex property is the out degree</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// Given a graph where the vertex property is the out degree</span>
 <span class="k">val</span> <span class="n">inputGraph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">Int</span>, <span class="kt">String</span><span class="o">]</span> <span class="k">=</span>
   <span class="n">graph</span><span class="o">.</span><span class="n">outerJoinVertices</span><span class="o">(</span><span class="n">graph</span><span class="o">.</span><span class="n">outDegrees</span><span class="o">)((</span><span class="n">vid</span><span class="o">,</span> <span class="k">_</span><span class="o">,</span> <span class="n">degOpt</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">degOpt</span><span class="o">.</span><span class="n">getOrElse</span><span class="o">(</span><span class="mi">0</span><span class="o">))</span>
 <span class="c1">// Construct a graph where each edge contains the weight</span>
 <span class="c1">// and each vertex is the initial PageRank</span>
 <span class="k">val</span> <span class="n">outputGraph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">Double</span>, <span class="kt">Double</span><span class="o">]</span> <span class="k">=</span>
-  <span class="n">inputGraph</span><span class="o">.</span><span class="n">mapTriplets</span><span class="o">(</span><span class="n">triplet</span> <span class="k">=&gt;</span> <span class="mf">1.0</span> <span class="o">/</span> <span class="n">triplet</span><span class="o">.</span><span class="n">srcAttr</span><span class="o">).</span><span class="n">mapVertices</span><span class="o">((</span><span class="n">id</span><span class="o">,</span> <span class="k">_</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="mf">1.0</span><span class="o">)</span></code></pre></div>
+  <span class="n">inputGraph</span><span class="o">.</span><span class="n">mapTriplets</span><span class="o">(</span><span class="n">triplet</span> <span class="k">=&gt;</span> <span class="mf">1.0</span> <span class="o">/</span> <span class="n">triplet</span><span class="o">.</span><span class="n">srcAttr</span><span class="o">).</span><span class="n">mapVertices</span><span class="o">((</span><span class="n">id</span><span class="o">,</span> <span class="k">_</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="mf">1.0</span><span class="o">)</span></code></pre></figure>
 
 <p><a name="structural_operators"></a></p>
 
@@ -469,13 +469,13 @@ unnecessary properties.  For example, given a graph with the out degrees as the
 <p>Currently GraphX supports only a simple set of commonly used structural operators and we expect to
 add more in the future.  The following is a list of the basic structural operators.</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">class</span> <span class="nc">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
   <span class="k">def</span> <span class="n">reverse</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span>
   <span class="k">def</span> <span class="n">subgraph</span><span class="o">(</span><span class="n">epred</span><span class="k">:</span> <span class="kt">EdgeTriplet</span><span class="o">[</span><span class="kt">VD</span>,<span class="kt">ED</span><span class="o">]</span> <span class="k">=&gt;</span> <span class="nc">Boolean</span><span class="o">,</span>
                <span class="n">vpred</span><span class="k">:</span> <span class="o">(</span><span class="kt">VertexId</span><span class="o">,</span> <span class="kt">VD</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="nc">Boolean</span><span class="o">)</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span>
   <span class="k">def</span> <span class="n">mask</span><span class="o">[</span><span class="kt">VD2</span>, <span class="kt">ED2</span><span class="o">](</span><span class="n">other</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">VD2</span>, <span class="kt">ED2</span><span class="o">])</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span>
   <span class="k">def</span> <span class="n">groupEdges</span><span class="o">(</span><span class="n">merge</span><span class="k">:</span> <span class="o">(</span><span class="kt">ED</span><span class="o">,</span> <span class="kt">ED</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="nc">ED</span><span class="o">)</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">VD</span>,<span class="kt">ED</span><span class="o">]</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
 <p>The <a href="api/scala/index.html#org.apache.spark.graphx.Graph@reverse:Graph[VD,ED]"><code>reverse</code></a> operator returns a new graph with all the edge directions reversed.
 This can be useful when, for example, trying to compute the inverse PageRank.  Because the reverse
@@ -488,7 +488,7 @@ satisfy the edge predicate <em>and connect vertices that satisfy the vertex pred
 operator can be used in number of situations to restrict the graph to the vertices and edges of
 interest or eliminate broken links. For example in the following code we remove broken links:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Create an RDD for the vertices</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// Create an RDD for the vertices</span>
 <span class="k">val</span> <span class="n">users</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[(</span><span class="kt">VertexId</span>, <span class="o">(</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">))]</span> <span class="k">=</span>
   <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span><span class="nc">Array</span><span class="o">((</span><span class="mi">3L</span><span class="o">,</span> <span class="o">(</span><span class="s">&quot;rxin&quot;</span><span class="o">,</span> <span class="s">&quot;student&quot;</span><span class="o">)),</span> <span class="o">(</span><span class="mi">7L</span><span class="o">,</span> <span class="o">(</span><span class="s">&quot;jgonzal&quot;</span><span class="o">,</span> <span class="s">&quot;postdoc&quot;</span><span class="o">)),</span>
                        <span class="o">(</span><span class="mi">5L</span><span class="o">,</span> <span class="o">(</span><span class="s">&quot;franklin&quot;</span><span class="o">,</span> <span class="s">&quot;prof&quot;</span><span class="o">)),</span> <span class="o">(</span><span class="mi">2L</span><span class="o">,</span> <span class="o">(</span><span class="s">&quot;istoica&quot;</span><span class="o">,</span> <span class="s">&quot;prof&quot;</span><span class="o">)),</span>
@@ -513,7 +513,7 @@ interest or eliminate broken links. For example in the following code we remove
 <span class="n">validGraph</span><span class="o">.</span><span class="n">vertices</span><span class="o">.</span><span class="n">collect</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">println</span><span class="o">(</span><span class="k">_</span><span class="o">))</span>
 <span class="n">validGraph</span><span class="o">.</span><span class="n">triplets</span><span class="o">.</span><span class="n">map</span><span class="o">(</span>
   <span class="n">triplet</span> <span class="k">=&gt;</span> <span class="n">triplet</span><span class="o">.</span><span class="n">srcAttr</span><span class="o">.</span><span class="n">_1</span> <span class="o">+</span> <span class="s">&quot; is the &quot;</span> <span class="o">+</span> <span class="n">triplet</span><span class="o">.</span><span class="n">attr</span> <span class="o">+</span> <span class="s">&quot; of &quot;</span> <span class="o">+</span> <span class="n">triplet</span><span class="o">.</span><span class="n">dstAttr</span><span class="o">.</span><span class="n">_1</span>
-<span class="o">).</span><span class="n">collect</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">println</span><span class="o">(</span><span class="k">_</span><span class="o">))</span></code></pre></div>
+<span class="o">).</span><span class="n">collect</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">println</span><span class="o">(</span><span class="k">_</span><span class="o">))</span></code></pre></figure>
 
 <blockquote>
   <p>Note in the above example only the vertex predicate is provided.  The <code>subgraph</code> operator defaults
@@ -526,12 +526,12 @@ vertices and edges that are also found in the input graph.  This can be used in
 example, we might run connected components using the graph with missing vertices and then restrict
 the answer to the valid subgraph.</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Run Connected Components</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// Run Connected Components</span>
 <span class="k">val</span> <span class="n">ccGraph</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">connectedComponents</span><span class="o">()</span> <span class="c1">// No longer contains missing field</span>
 <span class="c1">// Remove missing vertices as well as the edges to connected to them</span>
 <span class="k">val</span> <span class="n">validGraph</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">subgraph</span><span class="o">(</span><span class="n">vpred</span> <span class="k">=</span> <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">attr</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">attr</span><span class="o">.</span><span class="n">_2</span> <span class="o">!=</span> <span class="s">&quot;Missing&quot;</span><span class="o">)</span>
 <span class="c1">// Restrict the answer to the valid subgraph</span>
-<span class="k">val</span> <span class="n">validCCGraph</span> <span class="k">=</span> <span class="n">ccGraph</span><span class="o">.</span><span class="n">mask</span><span class="o">(</span><span class="n">validGraph</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">validCCGraph</span> <span class="k">=</span> <span class="n">ccGraph</span><span class="o">.</span><span class="n">mask</span><span class="o">(</span><span class="n">validGraph</span><span class="o">)</span></code></pre></figure>
 
 <p>The <a href="api/scala/index.html#org.apache.spark.graphx.Graph@groupEdges((ED,ED)\u21d2ED):Graph[VD,ED]"><code>groupEdges</code></a> operator merges parallel edges (i.e., duplicate edges between
 pairs of vertices) in the multigraph.  In many numerical applications, parallel edges can be <em>added</em>
@@ -546,12 +546,12 @@ example, we might have extra user properties that we want to merge with an exist
 might want to pull vertex properties from one graph into another.  These tasks can be accomplished
 using the <em>join</em> operators. Below we list the key join operators:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">class</span> <span class="nc">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
   <span class="k">def</span> <span class="n">joinVertices</span><span class="o">[</span><span class="kt">U</span><span class="o">](</span><span class="n">table</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[(</span><span class="kt">VertexId</span>, <span class="kt">U</span><span class="o">)])(</span><span class="n">map</span><span class="k">:</span> <span class="o">(</span><span class="kt">VertexId</span><span class="o">,</span> <span class="kt">VD</span><span class="o">,</span> <span class="n">U</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="nc">VD</span><span class="o">)</span>
     <span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span>
   <span class="k">def</span> <span class="n">outerJoinVertices</span><span class="o">[</span><span class="kt">U</span>, <span class="kt">VD2</span><span class="o">](</span><span class="n">table</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[(</span><span class="kt">VertexId</span>, <span class="kt">U</span><span class="o">)])(</span><span class="n">map</span><span class="k">:</span> <span class="o">(</span><span class="kt">VertexId</span><span class="o">,</span> <span class="kt">VD</span><span class="o">,</span> <span class="nc">Option</span><span class="o">[</span><span class="kt">U</span><span class="o">])</span> <span class="k">=&gt;</span> <span class="nc">VD2</span><span class="o">)</span>
     <span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">VD2</span>, <span class="kt">ED</span><span class="o">]</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
 <p>The <a href="api/scala/index.html#org.apache.spark.graphx.GraphOps@joinVertices[U](RDD[(VertexId,U)])((VertexId,VD,U)\u21d2VD)(ClassTag[U]):Graph[VD,ED]"><code>joinVertices</code></a> operator joins the vertices with the input RDD and
 returns a new graph with the vertex properties obtained by applying the user defined <code>map</code> function
@@ -563,12 +563,12 @@ original value.</p>
 is therefore recommended that the input RDD be made unique using the following which will
 also <em>pre-index</em> the resulting values to substantially accelerate the subsequent join.</p>
 
-</blockquote>
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">nonUniqueCosts</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[(</span><span class="kt">VertexId</span>, <span class="kt">Double</span><span class="o">)]</span>
+  <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">nonUniqueCosts</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[(</span><span class="kt">VertexId</span>, <span class="kt">Double</span><span class="o">)]</span>
 <span class="k">val</span> <span class="n">uniqueCosts</span><span class="k">:</span> <span class="kt">VertexRDD</span><span class="o">[</span><span class="kt">Double</span><span class="o">]</span> <span class="k">=</span>
   <span class="n">graph</span><span class="o">.</span><span class="n">vertices</span><span class="o">.</span><span class="n">aggregateUsingIndex</span><span class="o">(</span><span class="n">nonUnique</span><span class="o">,</span> <span class="o">(</span><span class="n">a</span><span class="o">,</span><span class="n">b</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">)</span>
 <span class="k">val</span> <span class="n">joinedGraph</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">joinVertices</span><span class="o">(</span><span class="n">uniqueCosts</span><span class="o">)(</span>
-  <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">oldCost</span><span class="o">,</span> <span class="n">extraCost</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">oldCost</span> <span class="o">+</span> <span class="n">extraCost</span><span class="o">)</span></code></pre></div>
+  <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">oldCost</span><span class="o">,</span> <span class="n">extraCost</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">oldCost</span> <span class="o">+</span> <span class="n">extraCost</span><span class="o">)</span></code></pre></figure>
+</blockquote>
 
 <p>The more general <a href="api/scala/index.html#org.apache.spark.graphx.Graph@outerJoinVertices[U,VD2](RDD[(VertexId,U)])((VertexId,VD,Option[U])\u21d2VD2)(ClassTag[U],ClassTag[VD2]):Graph[VD2,ED]"><code>outerJoinVertices</code></a> behaves similarly to <code>joinVertices</code>
 except that the user defined <code>map</code> function is applied to all vertices and can change the vertex
@@ -576,13 +576,13 @@ property type.  Because not all vertices may have a matching value in the input
 function takes an <code>Option</code> type.  For example, we can setup a graph for PageRank by initializing
 vertex properties with their <code>outDegree</code>.</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">outDegrees</span><span class="k">:</span> <span class="kt">VertexRDD</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">outDegrees</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">outDegrees</span><span class="k">:</span> <span class="kt">VertexRDD</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">outDegrees</span>
 <span class="k">val</span> <span class="n">degreeGraph</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">outerJoinVertices</span><span class="o">(</span><span class="n">outDegrees</span><span class="o">)</span> <span class="o">{</span> <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="n">oldAttr</span><span class="o">,</span> <span class="n">outDegOpt</span><span class="o">)</span> <span class="k">=&gt;</span>
   <span class="n">outDegOpt</span> <span class="k">match</span> <span class="o">{</span>
     <span class="k">case</span> <span class="nc">Some</span><span class="o">(</span><span class="n">outDeg</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">outDeg</span>
     <span class="k">case</span> <span class="nc">None</span> <span class="k">=&gt;</span> <span class="mi">0</span> <span class="c1">// No outDegree means zero outDegree</span>
   <span class="o">}</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
 <blockquote>
   <p>You may have noticed the multiple parameter lists (e.g., <code>f(a)(b)</code>) curried function pattern used
@@ -590,9 +590,9 @@ in the above examples.  While we could have equally written <code>f(a)(b)</code>
 that type inference on <code>b</code> would not depend on <code>a</code>.  As a consequence, the user would need to
 provide type annotation for the user defined function:</p>
 
+  <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">joinedGraph</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">joinVertices</span><span class="o">(</span><span class="n">uniqueCosts</span><span class="o">,</span>
+  <span class="o">(</span><span class="n">id</span><span class="k">:</span> <span class="kt">VertexId</span><span class="o">,</span> <span class="n">oldCost</span><span class="k">:</span> <span class="kt">Double</span><span class="o">,</span> <span class="n">extraCost</span><span class="k">:</span> <span class="kt">Double</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">oldCost</span> <span class="o">+</span> <span class="n">extraCost</span><span class="o">)</span></code></pre></figure>
 </blockquote>
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">joinedGraph</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">joinVertices</span><span class="o">(</span><span class="n">uniqueCosts</span><span class="o">,</span>
-  <span class="o">(</span><span class="n">id</span><span class="k">:</span> <span class="kt">VertexId</span><span class="o">,</span> <span class="n">oldCost</span><span class="k">:</span> <span class="kt">Double</span><span class="o">,</span> <span class="n">extraCost</span><span class="k">:</span> <span class="kt">Double</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">oldCost</span> <span class="o">+</span> <span class="n">extraCost</span><span class="o">)</span></code></pre></div>
 
 <blockquote>
 
@@ -623,13 +623,13 @@ relatively small, we provide a transition guide below.</p>
 This operator applies a user defined <code>sendMsg</code> function to each <i>edge triplet</i> in the graph
 and then uses the <code>mergeMsg</code> function to aggregate those messages at their destination vertex.</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">class</span> <span class="nc">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
   <span class="k">def</span> <span class="n">aggregateMessages</span><span class="o">[</span><span class="kt">Msg:</span> <span class="kt">ClassTag</span><span class="o">](</span>
       <span class="n">sendMsg</span><span class="k">:</span> <span class="kt">EdgeContext</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span>, <span class="kt">Msg</span><span class="o">]</span> <span class="k">=&gt;</span> <span class="nc">Unit</span><span class="o">,</span>
       <span class="n">mergeMsg</span><span class="k">:</span> <span class="o">(</span><span class="kt">Msg</span><span class="o">,</span> <span class="kt">Msg</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="nc">Msg</span><span class="o">,</span>
       <span class="n">tripletFields</span><span class="k">:</span> <span class="kt">TripletFields</span> <span class="o">=</span> <span class="nc">TripletFields</span><span class="o">.</span><span class="nc">All</span><span class="o">)</span>
     <span class="k">:</span> <span class="kt">VertexRDD</span><span class="o">[</span><span class="kt">Msg</span><span class="o">]</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
 <p>The user defined <code>sendMsg</code> function takes an <a href="api/scala/index.html#org.apache.spark.graphx.EdgeContext"><code>EdgeContext</code></a>, which exposes the
 source and destination attributes along with the edge attribute and functions
@@ -669,7 +669,7 @@ slightly unreliable and instead opted for more explicit user control.</p>
 <p>In the following example we use the <a href="api/scala/index.html#org.apache.spark.graphx.Graph@aggregateMessages[A]((EdgeContext[VD,ED,A])\u21d2Unit,(A,A)\u21d2A,TripletFields)(ClassTag[A]):VertexRDD[A]"><code>aggregateMessages</code></a> operator to
 compute the average age of the more senior followers of each user.</p>
 
-<div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.graphx.</span><span class="o">{</span><span class="nc">Graph</span><span class="o">,</span> <span class="nc">VertexRDD</span><span class="o">}</span>
+<div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.graphx.</span><span class="o">{</span><span class="nc">Graph</span><span class="o">,</span> <span class="nc">VertexRDD</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.graphx.util.GraphGenerators</span>
 
 <span class="c1">// Create a graph with &quot;age&quot; as the vertex property.</span>
@@ -708,12 +708,12 @@ messages) are constant sized (e.g., floats and addition instead of lists and con
 <p>In earlier versions of GraphX neighborhood aggregation was accomplished using the
 <code>mapReduceTriplets</code> operator:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">class</span> <span class="nc">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
   <span class="k">def</span> <span class="n">mapReduceTriplets</span><span class="o">[</span><span class="kt">Msg</span><span class="o">](</span>
       <span class="n">map</span><span class="k">:</span> <span class="kt">EdgeTriplet</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="k">=&gt;</span> <span class="nc">Iterator</span><span class="o">[(</span><span class="kt">VertexId</span>, <span class="kt">Msg</span><span class="o">)],</span>
       <span class="n">reduce</span><span class="k">:</span> <span class="o">(</span><span class="kt">Msg</span><span class="o">,</span> <span class="kt">Msg</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="nc">Msg</span><span class="o">)</span>
     <span class="k">:</span> <span class="kt">VertexRDD</span><span class="o">[</span><span class="kt">Msg</span><span class="o">]</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
 <p>The <code>mapReduceTriplets</code> operator takes a user defined map function which
 is applied to each triplet and can yield <em>messages</em> which are aggregated using the user defined
@@ -727,21 +727,21 @@ in the triplet are actually required.</p>
 
 <p>The following code block using <code>mapReduceTriplets</code>:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">graph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">Int</span>, <span class="kt">Float</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">graph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">Int</span>, <span class="kt">Float</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span>
 <span class="k">def</span> <span class="n">msgFun</span><span class="o">(</span><span class="n">triplet</span><span class="k">:</span> <span class="kt">Triplet</span><span class="o">[</span><span class="kt">Int</span>, <span class="kt">Float</span><span class="o">])</span><span class="k">:</span> <span class="kt">Iterator</span><span class="o">[(</span><span class="kt">Int</span>, <span class="kt">String</span><span class="o">)]</span> <span class="k">=</span> <span class="o">{</span>
   <span class="nc">Iterator</span><span class="o">((</span><span class="n">triplet</span><span class="o">.</span><span class="n">dstId</span><span class="o">,</span> <span class="s">&quot;Hi&quot;</span><span class="o">))</span>
 <span class="o">}</span>
 <span class="k">def</span> <span class="n">reduceFun</span><span class="o">(</span><span class="n">a</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">b</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="n">a</span> <span class="o">+</span> <span class="s">&quot; &quot;</span> <span class="o">+</span> <span class="n">b</span>
-<span class="k">val</span> <span class="n">result</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">mapReduceTriplets</span><span class="o">[</span><span class="kt">String</span><span class="o">](</span><span class="n">msgFun</span><span class="o">,</span> <span class="n">reduceFun</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">result</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">mapReduceTriplets</span><span class="o">[</span><span class="kt">String</span><span class="o">](</span><span class="n">msgFun</span><span class="o">,</span> <span class="n">reduceFun</span><span class="o">)</span></code></pre></figure>
 
 <p>can be rewritten using <code>aggregateMessages</code> as:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">graph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">Int</span>, <span class="kt">Float</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">graph</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">Int</span>, <span class="kt">Float</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span>
 <span class="k">def</span> <span class="n">msgFun</span><span class="o">(</span><span class="n">triplet</span><span class="k">:</span> <span class="kt">EdgeContext</span><span class="o">[</span><span class="kt">Int</span>, <span class="kt">Float</span>, <span class="kt">String</span><span class="o">])</span> <span class="o">{</span>
   <span class="n">triplet</span><span class="o">.</span><span class="n">sendToDst</span><span class="o">(</span><span class="s">&quot;Hi&quot;</span><span class="o">)</span>
 <span class="o">}</span>
 <span class="k">def</span> <span class="n">reduceFun</span><span class="o">(</span><span class="n">a</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">b</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="n">a</span> <span class="o">+</span> <span class="s">&quot; &quot;</span> <span class="o">+</span> <span class="n">b</span>
-<span class="k">val</span> <span class="n">result</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">aggregateMessages</span><span class="o">[</span><span class="kt">String</span><span class="o">](</span><span class="n">msgFun</span><span class="o">,</span> <span class="n">reduceFun</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">result</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">aggregateMessages</span><span class="o">[</span><span class="kt">String</span><span class="o">](</span><span class="n">msgFun</span><span class="o">,</span> <span class="n">reduceFun</span><span class="o">)</span></code></pre></figure>
 
 <h3 id="computing-degree-information">Computing Degree Information</h3>
 
@@ -751,14 +751,14 @@ out-degree, and the total degree of each vertex.  The  <a href="api/scala/index.
 collection of operators to compute the degrees of each vertex.  For example in the following we
 compute the max in, out, and total degrees:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Define a reduce operation to compute the highest degree vertex</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// Define a reduce operation to compute the highest degree vertex</span>
 <span class="k">def</span> <span class="n">max</span><span class="o">(</span><span class="n">a</span><span class="k">:</span> <span class="o">(</span><span class="kt">VertexId</span><span class="o">,</span> <span class="kt">Int</span><span class="o">),</span> <span class="n">b</span><span class="k">:</span> <span class="o">(</span><span class="kt">VertexId</span><span class="o">,</span> <span class="kt">Int</span><span class="o">))</span><span class="k">:</span> <span class="o">(</span><span class="kt">VertexId</span><span class="o">,</span> <span class="kt">Int</span><span class="o">)</span> <span class="k">=</span> <span class="o">{</span>
   <span class="k">if</span> <span class="o">(</span><span class="n">a</span><span class="o">.</span><span class="n">_2</span> <span class="o">&gt;</span> <span class="n">b</span><span class="o">.</span><span class="n">_2</span><span class="o">)</span> <span class="n">a</span> <span class="k">else</span> <span class="n">b</span>
 <span class="o">}</span>
 <span class="c1">// Compute the max degrees</span>
 <span class="k">val</span> <span class="n">maxInDegree</span><span class="k">:</span> <span class="o">(</span><span class="kt">VertexId</span><span class="o">,</span> <span class="kt">Int</span><span class="o">)</span>  <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">inDegrees</span><span class="o">.</span><span class="n">reduce</span><span class="o">(</span><span class="n">max</span><span class="o">)</span>
 <span class="k">val</span> <span class="n">maxOutDegree</span><span class="k">:</span> <span class="o">(</span><span class="kt">VertexId</span><span class="o">,</span> <span class="kt">Int</span><span class="o">)</span> <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">outDegrees</span><span class="o">.</span><span class="n">reduce</span><span class="o">(</span><span class="n">max</span><span class="o">)</span>
-<span class="k">val</span> <span class="n">maxDegrees</span><span class="k">:</span> <span class="o">(</span><span class="kt">VertexId</span><span class="o">,</span> <span class="kt">Int</span><span class="o">)</span>   <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">degrees</span><span class="o">.</span><span class="n">reduce</span><span class="o">(</span><span class="n">max</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">maxDegrees</span><span class="k">:</span> <span class="o">(</span><span class="kt">VertexId</span><span class="o">,</span> <span class="kt">Int</span><span class="o">)</span>   <span class="k">=</span> <span class="n">graph</span><span class="o">.</span><span class="n">degrees</span><span class="o">.</span><span class="n">reduce</span><span class="o">(</span><span class="n">max</span><span class="o">)</span></code></pre></figure>
 
 <h3 id="collecting-neighbors">Collecting Neighbors</h3>
 
@@ -767,10 +767,10 @@ attributes at each vertex. This can be easily accomplished using the
 <a href="api/scala/index.html#org.apache.spark.graphx.GraphOps@collectNeighborIds(EdgeDirection):VertexRDD[Array[VertexId]]"><code>collectNeighborIds</code></a> and the
 <a href="api/scala/index.html#org.apache.spark.graphx.GraphOps@collectNeighbors(EdgeDirection):VertexRDD[Array[(VertexId,VD)]]"><code>collectNeighbors</code></a> operators.</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">GraphOps</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">class</span> <span class="nc">GraphOps</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
   <span class="k">def</span> <span class="n">collectNeighborIds</span><span class="o">(</span><span class="n">edgeDirection</span><span class="k">:</span> <span class="kt">EdgeDirection</span><span class="o">)</span><span class="k">:</span> <span class="kt">VertexRDD</span><span class="o">[</span><span class="kt">Array</span><span class="o">[</span><span class="kt">VertexId</span><span class="o">]]</span>
   <span class="k">def</span> <span class="n">collectNeighbors</span><span class="o">(</span><span class="n">edgeDirection</span><span class="k">:</span> <span class="kt">EdgeDirection</span><span class="o">)</span><span class="k">:</span> <span class="kt">VertexRDD</span><span class="o">[</span> <span class="kt">Array</span><span class="o">[(</span><span class="kt">VertexId</span>, <span class="kt">VD</span><span class="o">)]</span> <span class="o">]</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
 <blockquote>
   <p>These operators can be quite costly as they duplicate information and require
@@ -813,7 +813,7 @@ messaging function.  These constraints allow additional optimization within Grap
 <p>The following is the type signature of the <a href="api/scala/index.html#org.apache.spark.graphx.GraphOps@pregel[A](A,Int,EdgeDirection)((VertexId,VD,A)\u21d2VD,(EdgeTriplet[VD,ED])\u21d2Iterator[(VertexId,A)],(A,A)\u21d2A)(ClassTag[A]):Graph[VD,ED]">Pregel operator</a> as well as a <em>sketch</em>
 of its implementation (note calls to graph.cache have been removed):</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">GraphOps</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">class</span> <span class="nc">GraphOps</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">]</span> <span class="o">{</span>
   <span class="k">def</span> <span class="n">pregel</span><span class="o">[</span><span class="kt">A</span><span class="o">]</span>
       <span class="o">(</span><span class="n">initialMsg</span><span class="k">:</span> <span class="kt">A</span><span class="o">,</span>
        <span class="n">maxIter</span><span class="k">:</span> <span class="kt">Int</span> <span class="o">=</span> <span class="nc">Int</span><span class="o">.</span><span class="nc">MaxValue</span><span class="o">,</span>
@@ -843,7 +843,7 @@ of its implementation (note calls to graph.cache have been removed):</p>
     <span class="o">}</span>
     <span class="n">g</span>
   <span class="o">}</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
 <p>Notice that Pregel takes two argument lists (i.e., <code>graph.pregel(list1)(list2)</code>).  The first
 argument list contains configuration parameters including the initial message, the maximum number of
@@ -854,7 +854,7 @@ second argument list contains the user defined functions for receiving messages
 <p>We can use the Pregel operator to express computation such as single source
 shortest path in the following example.</p>
 
-<div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.graphx.</span><span class="o">{</span><span class="nc">Graph</span><span class="o">,</span> <span class="nc">VertexId</span><span class="o">}</span>
+<div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.graphx.</span><span class="o">{</span><span class="nc">Graph</span><span class="o">,</span> <span class="nc">VertexId</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.graphx.util.GraphGenerators</span>
 
 <span class="c1">// A graph with edge attributes containing distances</span>
@@ -885,14 +885,14 @@ shortest path in the following example.</p>
 
 <p>GraphX provides several ways of building a graph from a collection of vertices and edges in an RDD or on disk. None of the graph builders repartitions the graph&#8217;s edges by default; instead, edges are left in their default partitions (such as their original blocks in HDFS). <a href="api/scala/index.html#org.apache.spark.graphx.Graph@groupEdges((ED,ED)\u21d2ED):Graph[VD,ED]"><code>Graph.groupEdges</code></a> requires the graph to be repartitioned because it assumes identical edges will be colocated on the same partition, so you must call <a href="api/scala/index.html#org.apache.spark.graphx.Graph@partitionBy(PartitionStrategy):Graph[VD,ED]"><code>Graph.partitionBy</code></a> before calling <code>groupEdges</code>.</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">object</span> <span class="nc">GraphLoader</span> <span class="o">{</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">object</span> <span class="nc">GraphLoader</span> <span class="o">{</span>
   <span class="k">def</span> <span class="n">edgeListFile</span><span class="o">(</span>
       <span class="n">sc</span><span class="k">:</span> <span class="kt">SparkContext</span><span class="o">,</span>
       <span class="n">path</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span>
       <span class="n">canonicalOrientation</span><span class="k">:</span> <span class="kt">Boolean</span> <span class="o">=</span> <span class="kc">false</span><span class="o">,</span>
       <span class="n">minEdgePartitions</span><span class="k">:</span> <span class="kt">Int</span> <span class="o">=</span> <span class="mi">1</span><span class="o">)</span>
     <span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">Int</span>, <span class="kt">Int</span><span class="o">]</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
 <p><a href="api/scala/index.html#org.apache.spark.graphx.GraphLoader$@edgeListFile(SparkContext,String,Boolean,Int):Graph[Int,Int]"><code>GraphLoader.edgeListFile</code></a> provides a way to load a graph from a list of edges on disk. It parses an adjacency list of (source vertex ID, destination vertex ID) pairs of the following form, skipping comment lines that begin with <code>#</code>:</p>
 
@@ -904,7 +904,7 @@ shortest path in the following example.</p>
 
 <p>It creates a <code>Graph</code> from the specified edges, automatically creating any vertices mentioned by edges. All vertex and edge attributes default to 1. The <code>canonicalOrientation</code> argument allows reorienting edges in the positive direction (<code>srcId &lt; dstId</code>), which is required by the <a href="api/scala/index.html#org.apache.spark.graphx.lib.ConnectedComponents$">connected components</a> algorithm. The <code>minEdgePartitions</code> argument specifies the minimum number of edge partitions to generate; there may be more edge partitions than specified if, for example, the HDFS file has more blocks.</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">object</span> <span class="nc">Graph</span> <span class="o">{</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">object</span> <span class="nc">Graph</span> <span class="o">{</span>
   <span class="k">def</span> <span class="n">apply</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">ED</span><span class="o">](</span>
       <span class="n">vertices</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[(</span><span class="kt">VertexId</span>, <span class="kt">VD</span><span class="o">)],</span>
       <span class="n">edges</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">Edge</span><span class="o">[</span><span class="kt">ED</span><span class="o">]],</span>
@@ -920,7 +920,7 @@ shortest path in the following example.</p>
       <span class="n">defaultValue</span><span class="k">:</span> <span class="kt">VD</span><span class="o">,</span>
       <span class="n">uniqueEdges</span><span class="k">:</span> <span class="kt">Option</span><span class="o">[</span><span class="kt">PartitionStrategy</span><span class="o">]</span> <span class="k">=</span> <span class="nc">None</span><span class="o">)</span><span class="k">:</span> <span class="kt">Graph</span><span class="o">[</span><span class="kt">VD</span>, <span class="kt">Int</span><span class="o">]</span>
 
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
 <p><a href="api/scala/index.html#org.apache.spark.graphx.Graph$@apply[VD,ED](RDD[(VertexId,VD)],RDD[Edge[ED]],VD)(ClassTag[VD],ClassTag[ED]):Graph[VD,ED]"><code>Graph.apply</code></a> allows creating a graph from RDDs of vertices and edges. Duplicate vertices are picked arbitrarily and vertices found in the edge RDD but not the vertex RDD are assigned the default attribute.</p>
 
@@ -936,7 +936,7 @@ shortest path in the following example.</p>
 GraphX maintains the vertices and edges in optimized data structures and these data structures
 provide additional functionality, the vertices and edges are returned as <code>VertexRDD</code><a href="api/scala/index.html#org.apache.spark.graphx.VertexRDD">VertexRDD</a> and <code>EdgeRDD</code><a href="api/scala/index.html#org.apache.spark.graphx.EdgeRDD">EdgeRDD</a>
 respectively.  In this section we review some of the additional useful functionality in these types.
-Note that this is just an incomplete list, please refer to the API docs for the official list of operations.</p>
+Note that this is just an incomplete list, please refer to the API docs for the official list of operations. </p>
 
 <h2 id="vertexrdds">VertexRDDs</h2>
 
@@ -948,7 +948,7 @@ hash-map data-structure.  As a consequence if two <code>VertexRDD</code>s are de
 evaluations. To leverage this indexed data structure, the <code>VertexRDD</code><a href="api/scala/index.html#org.apache.spark.graphx.VertexRDD">VertexRDD</a> exposes the following
 additional functionality:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">VertexRDD</span><span class="o">[</span><span class="kt">VD</span><span class="o">]</span> <span class="nc">extends</span> <span class="nc">RDD</span><span class="o">[(</span><span class="kt">VertexId</span>, <span class="kt">VD</span><span class="o">)]</span> <span class="o">{</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">class</span> <span class="nc">VertexRDD</span><span class="o">[</span><span class="kt">VD</span><span class="o">]</span> <span class="nc">extends</span> <span class="nc">RDD</span><span class="o">[(</span><span class="kt">VertexId</span>, <span class="kt">VD</span><span class="o">)]</span> <span class="o">{</span>
   <span class="c1">// Filter the vertex set but preserves the internal index</span>
   <span class="k">def</span> <span class="n">filter</span><span class="o">(</span><span class="n">pred</span><span class="k">:</span> <span class="kt">Tuple2</span><span class="o">[</span><span class="kt">VertexId</span>, <span class="kt">VD</span><span class="o">]</span> <span class="k">=&gt;</span> <span class="nc">Boolean</span><span class="o">)</span><span class="k">:</span> <span class="kt">VertexRDD</span><span class="o">[</span><span class="kt">VD</span><span class="o">]</span>
   <span class="c1">// Transform the values without changing the ids (preserves the internal index)</span>
@@ -963,7 +963,7 @@ additional functionality:</p>
   <span class="k">def</span> <span class="n">innerJoin</span><span class="o">[</span><span class="kt">U</span>, <span class="kt">VD2</span><span class="o">](</span><span class="n">other</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[(</span><span class="kt">VertexId</span>, <span class="kt">U</span><span class="o">)])(</span><span class="n">f</span><span class="k">:</span> <span class="o">(</span><span class="kt">VertexId</span><span class="o">,</span> <span class="kt">VD</span><span class="o">,</span> <span class="n">U</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="nc">VD2</span><span class="o">)</span><span class="k">:</span> <span class="kt">VertexRDD</span><span class="o">[</span><span class="kt">VD2</span><span class="o">]</span>
   <span class="c1">// Use the index on this RDD to accelerate a `reduceByKey` operation on the input RDD.</span>
   <span class="k">def</span> <span class="n">aggregateUsingIndex</span><span class="o">[</span><span class="kt">VD2</span><span class="o">](</span><span class="n">other</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[(</span><span class="kt">VertexId</span>, <span class="kt">VD2</span><span class="o">)],</span> <span class="n">reduceFunc</span><span class="k">:</span> <span class="o">(</span><span class="kt">VD2</span><span class="o">,</span> <span class="kt">VD2</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="nc">VD2</span><span class="o">)</span><span class="k">:</span> <span class="kt">VertexRDD</span><span class="o">[</span><span class="kt">VD2</span><span class="o">]</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
 <p>Notice, for example,  how the <code>filter</code> operator returns an <code>VertexRDD</code><a href="api/scala/index.html#org.apache.spark.graphx.VertexRDD">VertexRDD</a>.  Filter is actually
 implemented using a <code>BitSet</code> thereby reusing the index and preserving the ability to do fast joins
@@ -977,7 +977,7 @@ change the <code>VertexId</code> thereby enabling the same <code>HashMap</code>
 <em>which is a super-set</em> of the vertices in some <code>RDD[(VertexId, A)]</code> then I can reuse the index to
 both aggregate and then subsequently index the <code>RDD[(VertexId, A)]</code>.  For example:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">setA</span><span class="k">:</span> <span class="kt">VertexRDD</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span> <span class="nc">VertexRDD</span><span class="o">(</span><span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span><span class="mi">0L</span> <span class="n">until</span> <span class="mi">100L</span><span class="o">).</span><span class="n">map</span><span class="o">(</span><span class="n">id</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="mi">1</span><span class="o">)))</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">setA</span><span class="k">:</span> <span class="kt">VertexRDD</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span> <span class="nc">VertexRDD</span><span class="o">(</span><span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span><span class="mi">0L</span> <span class="n">until</span> <span class="mi">100L</span><span class="o">).</span><span class="n">map</span><span class="o">(</span><span class="n">id</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="mi">1</span><span class="o">)))</span>
 <span class="k">val</span> <span class="n">rddB</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[(</span><span class="kt">VertexId</span>, <span class="kt">Double</span><span class="o">)]</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span><span class="mi">0L</span> <span class="n">until</span> <span class="mi">100L</span><span class="o">).</span><span class="n">flatMap</span><span class="o">(</span><span class="n">id</span> <span class="k">=&gt;</span> <span class="nc">List</span><span class="o">((</span><span class="n">id</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span> <span class="o">(</span><span class="n">id</span><span class="o">,</span> <span class="mf">2.0</span><span class="o">)))</span>
 <span class="c1">// There should be 200 entries in rddB</span>
 <span class="n">rddB</span><span class="o">.</span><span class="n">count</span>
@@ -985

<TRUNCATED>

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[22/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/ml-classification-regression.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/ml-classification-regression.html b/site/docs/2.1.0/ml-classification-regression.html
index 1e0665b..0b264bb 100644
--- a/site/docs/2.1.0/ml-classification-regression.html
+++ b/site/docs/2.1.0/ml-classification-regression.html
@@ -329,58 +329,58 @@ discussing specific classes of algorithms, such as linear methods, trees, and en
 <p><strong>Table of Contents</strong></p>
 
 <ul id="markdown-toc">
-  <li><a href="#classification" id="markdown-toc-classification">Classification</a>    <ul>
-      <li><a href="#logistic-regression" id="markdown-toc-logistic-regression">Logistic regression</a>        <ul>
-          <li><a href="#binomial-logistic-regression" id="markdown-toc-binomial-logistic-regression">Binomial logistic regression</a></li>
-          <li><a href="#multinomial-logistic-regression" id="markdown-toc-multinomial-logistic-regression">Multinomial logistic regression</a></li>
+  <li><a href="#classification">Classification</a>    <ul>
+      <li><a href="#logistic-regression">Logistic regression</a>        <ul>
+          <li><a href="#binomial-logistic-regression">Binomial logistic regression</a></li>
+          <li><a href="#multinomial-logistic-regression">Multinomial logistic regression</a></li>
         </ul>
       </li>
-      <li><a href="#decision-tree-classifier" id="markdown-toc-decision-tree-classifier">Decision tree classifier</a></li>
-      <li><a href="#random-forest-classifier" id="markdown-toc-random-forest-classifier">Random forest classifier</a></li>
-      <li><a href="#gradient-boosted-tree-classifier" id="markdown-toc-gradient-boosted-tree-classifier">Gradient-boosted tree classifier</a></li>
-      <li><a href="#multilayer-perceptron-classifier" id="markdown-toc-multilayer-perceptron-classifier">Multilayer perceptron classifier</a></li>
-      <li><a href="#one-vs-rest-classifier-aka-one-vs-all" id="markdown-toc-one-vs-rest-classifier-aka-one-vs-all">One-vs-Rest classifier (a.k.a. One-vs-All)</a></li>
-      <li><a href="#naive-bayes" id="markdown-toc-naive-bayes">Naive Bayes</a></li>
+      <li><a href="#decision-tree-classifier">Decision tree classifier</a></li>
+      <li><a href="#random-forest-classifier">Random forest classifier</a></li>
+      <li><a href="#gradient-boosted-tree-classifier">Gradient-boosted tree classifier</a></li>
+      <li><a href="#multilayer-perceptron-classifier">Multilayer perceptron classifier</a></li>
+      <li><a href="#one-vs-rest-classifier-aka-one-vs-all">One-vs-Rest classifier (a.k.a. One-vs-All)</a></li>
+      <li><a href="#naive-bayes">Naive Bayes</a></li>
     </ul>
   </li>
-  <li><a href="#regression" id="markdown-toc-regression">Regression</a>    <ul>
-      <li><a href="#linear-regression" id="markdown-toc-linear-regression">Linear regression</a></li>
-      <li><a href="#generalized-linear-regression" id="markdown-toc-generalized-linear-regression">Generalized linear regression</a>        <ul>
-          <li><a href="#available-families" id="markdown-toc-available-families">Available families</a></li>
+  <li><a href="#regression">Regression</a>    <ul>
+      <li><a href="#linear-regression">Linear regression</a></li>
+      <li><a href="#generalized-linear-regression">Generalized linear regression</a>        <ul>
+          <li><a href="#available-families">Available families</a></li>
         </ul>
       </li>
-      <li><a href="#decision-tree-regression" id="markdown-toc-decision-tree-regression">Decision tree regression</a></li>
-      <li><a href="#random-forest-regression" id="markdown-toc-random-forest-regression">Random forest regression</a></li>
-      <li><a href="#gradient-boosted-tree-regression" id="markdown-toc-gradient-boosted-tree-regression">Gradient-boosted tree regression</a></li>
-      <li><a href="#survival-regression" id="markdown-toc-survival-regression">Survival regression</a></li>
-      <li><a href="#isotonic-regression" id="markdown-toc-isotonic-regression">Isotonic regression</a>        <ul>
-          <li><a href="#examples" id="markdown-toc-examples">Examples</a></li>
+      <li><a href="#decision-tree-regression">Decision tree regression</a></li>
+      <li><a href="#random-forest-regression">Random forest regression</a></li>
+      <li><a href="#gradient-boosted-tree-regression">Gradient-boosted tree regression</a></li>
+      <li><a href="#survival-regression">Survival regression</a></li>
+      <li><a href="#isotonic-regression">Isotonic regression</a>        <ul>
+          <li><a href="#examples">Examples</a></li>
         </ul>
       </li>
     </ul>
   </li>
-  <li><a href="#linear-methods" id="markdown-toc-linear-methods">Linear methods</a></li>
-  <li><a href="#decision-trees" id="markdown-toc-decision-trees">Decision trees</a>    <ul>
-      <li><a href="#inputs-and-outputs" id="markdown-toc-inputs-and-outputs">Inputs and Outputs</a>        <ul>
-          <li><a href="#input-columns" id="markdown-toc-input-columns">Input Columns</a></li>
-          <li><a href="#output-columns" id="markdown-toc-output-columns">Output Columns</a></li>
+  <li><a href="#linear-methods">Linear methods</a></li>
+  <li><a href="#decision-trees">Decision trees</a>    <ul>
+      <li><a href="#inputs-and-outputs">Inputs and Outputs</a>        <ul>
+          <li><a href="#input-columns">Input Columns</a></li>
+          <li><a href="#output-columns">Output Columns</a></li>
         </ul>
       </li>
     </ul>
   </li>
-  <li><a href="#tree-ensembles" id="markdown-toc-tree-ensembles">Tree Ensembles</a>    <ul>
-      <li><a href="#random-forests" id="markdown-toc-random-forests">Random Forests</a>        <ul>
-          <li><a href="#inputs-and-outputs-1" id="markdown-toc-inputs-and-outputs-1">Inputs and Outputs</a>            <ul>
-              <li><a href="#input-columns-1" id="markdown-toc-input-columns-1">Input Columns</a></li>
-              <li><a href="#output-columns-predictions" id="markdown-toc-output-columns-predictions">Output Columns (Predictions)</a></li>
+  <li><a href="#tree-ensembles">Tree Ensembles</a>    <ul>
+      <li><a href="#random-forests">Random Forests</a>        <ul>
+          <li><a href="#inputs-and-outputs-1">Inputs and Outputs</a>            <ul>
+              <li><a href="#input-columns-1">Input Columns</a></li>
+              <li><a href="#output-columns-predictions">Output Columns (Predictions)</a></li>
             </ul>
           </li>
         </ul>
       </li>
-      <li><a href="#gradient-boosted-trees-gbts" id="markdown-toc-gradient-boosted-trees-gbts">Gradient-Boosted Trees (GBTs)</a>        <ul>
-          <li><a href="#inputs-and-outputs-2" id="markdown-toc-inputs-and-outputs-2">Inputs and Outputs</a>            <ul>
-              <li><a href="#input-columns-2" id="markdown-toc-input-columns-2">Input Columns</a></li>
-              <li><a href="#output-columns-predictions-1" id="markdown-toc-output-columns-predictions-1">Output Columns (Predictions)</a></li>
+      <li><a href="#gradient-boosted-trees-gbts">Gradient-Boosted Trees (GBTs)</a>        <ul>
+          <li><a href="#inputs-and-outputs-2">Inputs and Outputs</a>            <ul>
+              <li><a href="#input-columns-2">Input Columns</a></li>
+              <li><a href="#output-columns-predictions-1">Output Columns (Predictions)</a></li>
             </ul>
           </li>
         </ul>
@@ -407,7 +407,7 @@ parameter to select between these two algorithms, or leave it unset and Spark wi
 
 <h3 id="binomial-logistic-regression">Binomial logistic regression</h3>
 
-<p>For more background and more details about the implementation of binomial logistic regression, refer to the documentation of <a href="mllib-linear-methods.html#logistic-regression">logistic regression in <code>spark.mllib</code></a>.</p>
+<p>For more background and more details about the implementation of binomial logistic regression, refer to the documentation of <a href="mllib-linear-methods.html#logistic-regression">logistic regression in <code>spark.mllib</code></a>. </p>
 
 <p><strong>Example</strong></p>
 
@@ -421,7 +421,7 @@ $\alpha$ and <code>regParam</code> corresponds to $\lambda$.</p>
 
     <p>More details on parameters can be found in the <a href="api/scala/index.html#org.apache.spark.ml.classification.LogisticRegression">Scala API documentation</a>.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegression</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegression</span>
 
 <span class="c1">// Load training data</span>
 <span class="k">val</span> <span class="n">training</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="o">(</span><span class="s">&quot;libsvm&quot;</span><span class="o">).</span><span class="n">load</span><span class="o">(</span><span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="o">)</span>
@@ -435,7 +435,7 @@ $\alpha$ and <code>regParam</code> corresponds to $\lambda$.</p>
 <span class="k">val</span> <span class="n">lrModel</span> <span class="k">=</span> <span class="n">lr</span><span class="o">.</span><span class="n">fit</span><span class="o">(</span><span class="n">training</span><span class="o">)</span>
 
 <span class="c1">// Print the coefficients and intercept for logistic regression</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Coefficients: </span><span class="si">${</span><span class="n">lrModel</span><span class="o">.</span><span class="n">coefficients</span><span class="si">}</span><span class="s"> Intercept: </span><span class="si">${</span><span class="n">lrModel</span><span class="o">.</span><span class="n">intercept</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 
 <span class="c1">// We can also use the multinomial family for binary classification</span>
 <span class="k">val</span> <span class="n">mlr</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">LogisticRegression</span><span class="o">()</span>
@@ -447,8 +447,8 @@ $\alpha$ and <code>regParam</code> corresponds to $\lambda$.</p>
 <span class="k">val</span> <span class="n">mlrModel</span> <span class="k">=</span> <span class="n">mlr</span><span class="o">.</span><span class="n">fit</span><span class="o">(</span><span class="n">training</span><span class="o">)</span>
 
 <span class="c1">// Print the coefficients and intercepts for logistic regression with multinomial family</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Multinomial coefficients: ${mlrModel.coefficientMatrix}&quot;</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Multinomial intercepts: ${mlrModel.interceptVector}&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Multinomial coefficients: </span><span class="si">${</span><span class="n">mlrModel</span><span class="o">.</span><span class="n">coefficientMatrix</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Multinomial intercepts: </span><span class="si">${</span><span class="n">mlrModel</span><span class="o">.</span><span class="n">interceptVector</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/ml/LogisticRegressionWithElasticNetExample.scala" in the Spark repo.</small></div>
   </div>
@@ -457,7 +457,7 @@ $\alpha$ and <code>regParam</code> corresponds to $\lambda$.</p>
 
     <p>More details on parameters can be found in the <a href="api/java/org/apache/spark/ml/classification/LogisticRegression.html">Java API documentation</a>.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegression</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegression</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegressionModel</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.Row</span><span class="o">;</span>
@@ -467,7 +467,7 @@ $\alpha$ and <code>regParam</code> corresponds to $\lambda$.</p>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">training</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">read</span><span class="o">().</span><span class="na">format</span><span class="o">(</span><span class="s">&quot;libsvm&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">load</span><span class="o">(</span><span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="o">);</span>
 
-<span class="n">LogisticRegression</span> <span class="n">lr</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LogisticRegression</span><span class="o">()</span>
+<span class="n">LogisticRegression</span> <span class="n">lr</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LogisticRegression</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setMaxIter</span><span class="o">(</span><span class="mi">10</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setRegParam</span><span class="o">(</span><span class="mf">0.3</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setElasticNetParam</span><span class="o">(</span><span class="mf">0.8</span><span class="o">);</span>
@@ -480,7 +480,7 @@ $\alpha$ and <code>regParam</code> corresponds to $\lambda$.</p>
   <span class="o">+</span> <span class="n">lrModel</span><span class="o">.</span><span class="na">coefficients</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot; Intercept: &quot;</span> <span class="o">+</span> <span class="n">lrModel</span><span class="o">.</span><span class="na">intercept</span><span class="o">());</span>
 
 <span class="c1">// We can also use the multinomial family for binary classification</span>
-<span class="n">LogisticRegression</span> <span class="n">mlr</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LogisticRegression</span><span class="o">()</span>
+<span class="n">LogisticRegression</span> <span class="n">mlr</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LogisticRegression</span><span class="o">()</span>
         <span class="o">.</span><span class="na">setMaxIter</span><span class="o">(</span><span class="mi">10</span><span class="o">)</span>
         <span class="o">.</span><span class="na">setRegParam</span><span class="o">(</span><span class="mf">0.3</span><span class="o">)</span>
         <span class="o">.</span><span class="na">setElasticNetParam</span><span class="o">(</span><span class="mf">0.8</span><span class="o">)</span>
@@ -500,29 +500,29 @@ $\alpha$ and <code>regParam</code> corresponds to $\lambda$.</p>
 
     <p>More details on parameters can be found in the <a href="api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression">Python API documentation</a>.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.classification</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.classification</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
 
-<span class="c"># Load training data</span>
-<span class="n">training</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
+<span class="c1"># Load training data</span>
+<span class="n">training</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
 
 <span class="n">lr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">(</span><span class="n">maxIter</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">regParam</span><span class="o">=</span><span class="mf">0.3</span><span class="p">,</span> <span class="n">elasticNetParam</span><span class="o">=</span><span class="mf">0.8</span><span class="p">)</span>
 
-<span class="c"># Fit the model</span>
+<span class="c1"># Fit the model</span>
 <span class="n">lrModel</span> <span class="o">=</span> <span class="n">lr</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
 
-<span class="c"># Print the coefficients and intercept for logistic regression</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Coefficients: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">lrModel</span><span class="o">.</span><span class="n">coefficients</span><span class="p">))</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Intercept: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">lrModel</span><span class="o">.</span><span class="n">intercept</span><span class="p">))</span>
+<span class="c1"># Print the coefficients and intercept for logistic regression</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Coefficients: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">lrModel</span><span class="o">.</span><span class="n">coefficients</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Intercept: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">lrModel</span><span class="o">.</span><span class="n">intercept</span><span class="p">))</span>
 
-<span class="c"># We can also use the multinomial family for binary classification</span>
-<span class="n">mlr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">(</span><span class="n">maxIter</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">regParam</span><span class="o">=</span><span class="mf">0.3</span><span class="p">,</span> <span class="n">elasticNetParam</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span> <span class="n">family</span><span class="o">=</span><span class="s">&quot;multinomial&quot;</span><span class="p">)</span>
+<span class="c1"># We can also use the multinomial family for binary classification</span>
+<span class="n">mlr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">(</span><span class="n">maxIter</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">regParam</span><span class="o">=</span><span class="mf">0.3</span><span class="p">,</span> <span class="n">elasticNetParam</span><span class="o">=</span><span class="mf">0.8</span><span class="p">,</span> <span class="n">family</span><span class="o">=</span><span class="s2">&quot;multinomial&quot;</span><span class="p">)</span>
 
-<span class="c"># Fit the model</span>
+<span class="c1"># Fit the model</span>
 <span class="n">mlrModel</span> <span class="o">=</span> <span class="n">mlr</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
 
-<span class="c"># Print the coefficients and intercepts for logistic regression with multinomial family</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Multinomial coefficients: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">mlrModel</span><span class="o">.</span><span class="n">coefficientMatrix</span><span class="p">))</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Multinomial intercepts: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">mlrModel</span><span class="o">.</span><span class="n">interceptVector</span><span class="p">))</span>
+<span class="c1"># Print the coefficients and intercepts for logistic regression with multinomial family</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Multinomial coefficients: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">mlrModel</span><span class="o">.</span><span class="n">coefficientMatrix</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Multinomial intercepts: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">mlrModel</span><span class="o">.</span><span class="n">interceptVector</span><span class="p">))</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/ml/logistic_regression_with_elastic_net.py" in the Spark repo.</small></div>
   </div>
@@ -531,7 +531,7 @@ $\alpha$ and <code>regParam</code> corresponds to $\lambda$.</p>
 
     <p>More details on parameters can be found in the <a href="api/R/spark.logit.html">R API documentation</a>.</p>
 
-    <div class="highlight"><pre><span class="c1"># Load training data</span>
+    <div class="highlight"><pre><span></span><span class="c1"># Load training data</span>
 df <span class="o">&lt;-</span> read.df<span class="p">(</span><span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">,</span> <span class="kn">source</span> <span class="o">=</span> <span class="s">&quot;libsvm&quot;</span><span class="p">)</span>
 training <span class="o">&lt;-</span> df
 test <span class="o">&lt;-</span> df
@@ -571,7 +571,7 @@ This will likely change when multiclass classification is supported.</p>
 
     <p>Continuing the earlier example:</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.classification.</span><span class="o">{</span><span class="nc">BinaryLogisticRegressionSummary</span><span class="o">,</span> <span class="nc">LogisticRegression</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.classification.</span><span class="o">{</span><span class="nc">BinaryLogisticRegressionSummary</span><span class="o">,</span> <span class="nc">LogisticRegression</span><span class="o">}</span>
 
 <span class="c1">// Extract the summary from the returned LogisticRegressionModel instance trained in the earlier</span>
 <span class="c1">// example</span>
@@ -590,7 +590,7 @@ This will likely change when multiclass classification is supported.</p>
 <span class="c1">// Obtain the receiver-operating characteristic as a dataframe and areaUnderROC.</span>
 <span class="k">val</span> <span class="n">roc</span> <span class="k">=</span> <span class="n">binarySummary</span><span class="o">.</span><span class="n">roc</span>
 <span class="n">roc</span><span class="o">.</span><span class="n">show</span><span class="o">()</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;areaUnderROC: ${binarySummary.areaUnderROC}&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;areaUnderROC: </span><span class="si">${</span><span class="n">binarySummary</span><span class="o">.</span><span class="n">areaUnderROC</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 
 <span class="c1">// Set the model threshold to maximize F-Measure</span>
 <span class="k">val</span> <span class="n">fMeasure</span> <span class="k">=</span> <span class="n">binarySummary</span><span class="o">.</span><span class="n">fMeasureByThreshold</span>
@@ -613,7 +613,7 @@ Support for multiclass model summaries will be added in the future.</p>
 
     <p>Continuing the earlier example:</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.ml.classification.BinaryLogisticRegressionSummary</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.ml.classification.BinaryLogisticRegressionSummary</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegression</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegressionModel</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegressionTrainingSummary</span><span class="o">;</span>
@@ -663,27 +663,27 @@ Currently, only binary classification is supported. Support for multiclass model
 
     <p>Continuing the earlier example:</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.classification</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.classification</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
 
-<span class="c"># Extract the summary from the returned LogisticRegressionModel instance trained</span>
-<span class="c"># in the earlier example</span>
+<span class="c1"># Extract the summary from the returned LogisticRegressionModel instance trained</span>
+<span class="c1"># in the earlier example</span>
 <span class="n">trainingSummary</span> <span class="o">=</span> <span class="n">lrModel</span><span class="o">.</span><span class="n">summary</span>
 
-<span class="c"># Obtain the objective per iteration</span>
+<span class="c1"># Obtain the objective per iteration</span>
 <span class="n">objectiveHistory</span> <span class="o">=</span> <span class="n">trainingSummary</span><span class="o">.</span><span class="n">objectiveHistory</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;objectiveHistory:&quot;</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;objectiveHistory:&quot;</span><span class="p">)</span>
 <span class="k">for</span> <span class="n">objective</span> <span class="ow">in</span> <span class="n">objectiveHistory</span><span class="p">:</span>
     <span class="k">print</span><span class="p">(</span><span class="n">objective</span><span class="p">)</span>
 
-<span class="c"># Obtain the receiver-operating characteristic as a dataframe and areaUnderROC.</span>
+<span class="c1"># Obtain the receiver-operating characteristic as a dataframe and areaUnderROC.</span>
 <span class="n">trainingSummary</span><span class="o">.</span><span class="n">roc</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;areaUnderROC: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">trainingSummary</span><span class="o">.</span><span class="n">areaUnderROC</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;areaUnderROC: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">trainingSummary</span><span class="o">.</span><span class="n">areaUnderROC</span><span class="p">))</span>
 
-<span class="c"># Set the model threshold to maximize F-Measure</span>
+<span class="c1"># Set the model threshold to maximize F-Measure</span>
 <span class="n">fMeasure</span> <span class="o">=</span> <span class="n">trainingSummary</span><span class="o">.</span><span class="n">fMeasureByThreshold</span>
-<span class="n">maxFMeasure</span> <span class="o">=</span> <span class="n">fMeasure</span><span class="o">.</span><span class="n">groupBy</span><span class="p">()</span><span class="o">.</span><span class="n">max</span><span class="p">(</span><span class="s">&#39;F-Measure&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&#39;max(F-Measure)&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
-<span class="n">bestThreshold</span> <span class="o">=</span> <span class="n">fMeasure</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">fMeasure</span><span class="p">[</span><span class="s">&#39;F-Measure&#39;</span><span class="p">]</span> <span class="o">==</span> <span class="n">maxFMeasure</span><span class="p">[</span><span class="s">&#39;max(F-Measure)&#39;</span><span class="p">])</span> \
-    <span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&#39;threshold&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()[</span><span class="s">&#39;threshold&#39;</span><span class="p">]</span>
+<span class="n">maxFMeasure</span> <span class="o">=</span> <span class="n">fMeasure</span><span class="o">.</span><span class="n">groupBy</span><span class="p">()</span><span class="o">.</span><span class="n">max</span><span class="p">(</span><span class="s1">&#39;F-Measure&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s1">&#39;max(F-Measure)&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
+<span class="n">bestThreshold</span> <span class="o">=</span> <span class="n">fMeasure</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="n">fMeasure</span><span class="p">[</span><span class="s1">&#39;F-Measure&#39;</span><span class="p">]</span> <span class="o">==</span> <span class="n">maxFMeasure</span><span class="p">[</span><span class="s1">&#39;max(F-Measure)&#39;</span><span class="p">])</span> \
+    <span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s1">&#39;threshold&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()[</span><span class="s1">&#39;threshold&#39;</span><span class="p">]</span>
 <span class="n">lr</span><span class="o">.</span><span class="n">setThreshold</span><span class="p">(</span><span class="n">bestThreshold</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/ml/logistic_regression_summary_example.py" in the Spark repo.</small></div>
@@ -728,7 +728,7 @@ model with elastic net regularization.</p>
 <div class="codetabs">
 
 <div data-lang="scala">
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegression</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegression</span>
 
 <span class="c1">// Load training data</span>
 <span class="k">val</span> <span class="n">training</span> <span class="k">=</span> <span class="n">spark</span>
@@ -745,14 +745,14 @@ model with elastic net regularization.</p>
 <span class="k">val</span> <span class="n">lrModel</span> <span class="k">=</span> <span class="n">lr</span><span class="o">.</span><span class="n">fit</span><span class="o">(</span><span class="n">training</span><span class="o">)</span>
 
 <span class="c1">// Print the coefficients and intercept for multinomial logistic regression</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Coefficients: \n${lrModel.coefficientMatrix}&quot;</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Intercepts: ${lrModel.interceptVector}&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Coefficients: \n</span><span class="si">${</span><span class="n">lrModel</span><span class="o">.</span><span class="n">coefficientMatrix</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Intercepts: </span><span class="si">${</span><span class="n">lrModel</span><span class="o">.</span><span class="n">interceptVector</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/ml/MulticlassLogisticRegressionWithElasticNetExample.scala" in the Spark repo.</small></div>
   </div>
 
 <div data-lang="java">
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegression</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegression</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegressionModel</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.Row</span><span class="o">;</span>
@@ -762,7 +762,7 @@ model with elastic net regularization.</p>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">training</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">read</span><span class="o">().</span><span class="na">format</span><span class="o">(</span><span class="s">&quot;libsvm&quot;</span><span class="o">)</span>
         <span class="o">.</span><span class="na">load</span><span class="o">(</span><span class="s">&quot;data/mllib/sample_multiclass_classification_data.txt&quot;</span><span class="o">);</span>
 
-<span class="n">LogisticRegression</span> <span class="n">lr</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LogisticRegression</span><span class="o">()</span>
+<span class="n">LogisticRegression</span> <span class="n">lr</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LogisticRegression</span><span class="o">()</span>
         <span class="o">.</span><span class="na">setMaxIter</span><span class="o">(</span><span class="mi">10</span><span class="o">)</span>
         <span class="o">.</span><span class="na">setRegParam</span><span class="o">(</span><span class="mf">0.3</span><span class="o">)</span>
         <span class="o">.</span><span class="na">setElasticNetParam</span><span class="o">(</span><span class="mf">0.8</span><span class="o">);</span>
@@ -778,22 +778,22 @@ model with elastic net regularization.</p>
   </div>
 
 <div data-lang="python">
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.classification</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.classification</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
 
-<span class="c"># Load training data</span>
+<span class="c1"># Load training data</span>
 <span class="n">training</span> <span class="o">=</span> <span class="n">spark</span> \
     <span class="o">.</span><span class="n">read</span> \
-    <span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;libsvm&quot;</span><span class="p">)</span> \
-    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_multiclass_classification_data.txt&quot;</span><span class="p">)</span>
+    <span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;libsvm&quot;</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_multiclass_classification_data.txt&quot;</span><span class="p">)</span>
 
 <span class="n">lr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">(</span><span class="n">maxIter</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">regParam</span><span class="o">=</span><span class="mf">0.3</span><span class="p">,</span> <span class="n">elasticNetParam</span><span class="o">=</span><span class="mf">0.8</span><span class="p">)</span>
 
-<span class="c"># Fit the model</span>
+<span class="c1"># Fit the model</span>
 <span class="n">lrModel</span> <span class="o">=</span> <span class="n">lr</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
 
-<span class="c"># Print the coefficients and intercept for multinomial logistic regression</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Coefficients: </span><span class="se">\n</span><span class="s">&quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">lrModel</span><span class="o">.</span><span class="n">coefficientMatrix</span><span class="p">))</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Intercept: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">lrModel</span><span class="o">.</span><span class="n">interceptVector</span><span class="p">))</span>
+<span class="c1"># Print the coefficients and intercept for multinomial logistic regression</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Coefficients: </span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">lrModel</span><span class="o">.</span><span class="n">coefficientMatrix</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Intercept: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">lrModel</span><span class="o">.</span><span class="n">interceptVector</span><span class="p">))</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/ml/multiclass_logistic_regression_with_elastic_net.py" in the Spark repo.</small></div>
   </div>
@@ -802,7 +802,7 @@ model with elastic net regularization.</p>
 
     <p>More details on parameters can be found in the <a href="api/R/spark.logit.html">R API documentation</a>.</p>
 
-    <div class="highlight"><pre><span class="c1"># Load training data</span>
+    <div class="highlight"><pre><span></span><span class="c1"># Load training data</span>
 df <span class="o">&lt;-</span> read.df<span class="p">(</span><span class="s">&quot;data/mllib/sample_multiclass_classification_data.txt&quot;</span><span class="p">,</span> <span class="kn">source</span> <span class="o">=</span> <span class="s">&quot;libsvm&quot;</span><span class="p">)</span>
 training <span class="o">&lt;-</span> df
 test <span class="o">&lt;-</span> df
@@ -837,7 +837,7 @@ We use two feature transformers to prepare the data; these help index categories
 
     <p>More details on parameters can be found in the <a href="api/scala/index.html#org.apache.spark.ml.classification.DecisionTreeClassifier">Scala API documentation</a>.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.classification.DecisionTreeClassificationModel</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.classification.DecisionTreeClassifier</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator</span>
@@ -905,7 +905,7 @@ We use two feature transformers to prepare the data; these help index categories
 
     <p>More details on parameters can be found in the <a href="api/java/org/apache/spark/ml/classification/DecisionTreeClassifier.html">Java API documentation</a>.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.PipelineModel</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.PipelineStage</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.classification.DecisionTreeClassifier</span><span class="o">;</span>
@@ -924,13 +924,13 @@ We use two feature transformers to prepare the data; these help index categories
 
 <span class="c1">// Index labels, adding metadata to the label column.</span>
 <span class="c1">// Fit on whole dataset to include all labels in index.</span>
-<span class="n">StringIndexerModel</span> <span class="n">labelIndexer</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StringIndexer</span><span class="o">()</span>
+<span class="n">StringIndexerModel</span> <span class="n">labelIndexer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StringIndexer</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;label&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;indexedLabel&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">fit</span><span class="o">(</span><span class="n">data</span><span class="o">);</span>
 
 <span class="c1">// Automatically identify categorical features, and index them.</span>
-<span class="n">VectorIndexerModel</span> <span class="n">featureIndexer</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">VectorIndexer</span><span class="o">()</span>
+<span class="n">VectorIndexerModel</span> <span class="n">featureIndexer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">VectorIndexer</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;features&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;indexedFeatures&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setMaxCategories</span><span class="o">(</span><span class="mi">4</span><span class="o">)</span> <span class="c1">// features with &gt; 4 distinct values are treated as continuous.</span>
@@ -942,18 +942,18 @@ We use two feature transformers to prepare the data; these help index categories
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">testData</span> <span class="o">=</span> <span class="n">splits</span><span class="o">[</span><span class="mi">1</span><span class="o">];</span>
 
 <span class="c1">// Train a DecisionTree model.</span>
-<span class="n">DecisionTreeClassifier</span> <span class="n">dt</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">DecisionTreeClassifier</span><span class="o">()</span>
+<span class="n">DecisionTreeClassifier</span> <span class="n">dt</span> <span class="o">=</span> <span class="k">new</span> <span class="n">DecisionTreeClassifier</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setLabelCol</span><span class="o">(</span><span class="s">&quot;indexedLabel&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setFeaturesCol</span><span class="o">(</span><span class="s">&quot;indexedFeatures&quot;</span><span class="o">);</span>
 
 <span class="c1">// Convert indexed labels back to original labels.</span>
-<span class="n">IndexToString</span> <span class="n">labelConverter</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">IndexToString</span><span class="o">()</span>
+<span class="n">IndexToString</span> <span class="n">labelConverter</span> <span class="o">=</span> <span class="k">new</span> <span class="n">IndexToString</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;prediction&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;predictedLabel&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setLabels</span><span class="o">(</span><span class="n">labelIndexer</span><span class="o">.</span><span class="na">labels</span><span class="o">());</span>
 
 <span class="c1">// Chain indexers and tree in a Pipeline.</span>
-<span class="n">Pipeline</span> <span class="n">pipeline</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Pipeline</span><span class="o">()</span>
+<span class="n">Pipeline</span> <span class="n">pipeline</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Pipeline</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setStages</span><span class="o">(</span><span class="k">new</span> <span class="n">PipelineStage</span><span class="o">[]{</span><span class="n">labelIndexer</span><span class="o">,</span> <span class="n">featureIndexer</span><span class="o">,</span> <span class="n">dt</span><span class="o">,</span> <span class="n">labelConverter</span><span class="o">});</span>
 
 <span class="c1">// Train model. This also runs the indexers.</span>
@@ -966,7 +966,7 @@ We use two feature transformers to prepare the data; these help index categories
 <span class="n">predictions</span><span class="o">.</span><span class="na">select</span><span class="o">(</span><span class="s">&quot;predictedLabel&quot;</span><span class="o">,</span> <span class="s">&quot;label&quot;</span><span class="o">,</span> <span class="s">&quot;features&quot;</span><span class="o">).</span><span class="na">show</span><span class="o">(</span><span class="mi">5</span><span class="o">);</span>
 
 <span class="c1">// Select (prediction, true label) and compute test error.</span>
-<span class="n">MulticlassClassificationEvaluator</span> <span class="n">evaluator</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">MulticlassClassificationEvaluator</span><span class="o">()</span>
+<span class="n">MulticlassClassificationEvaluator</span> <span class="n">evaluator</span> <span class="o">=</span> <span class="k">new</span> <span class="n">MulticlassClassificationEvaluator</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setLabelCol</span><span class="o">(</span><span class="s">&quot;indexedLabel&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setPredictionCol</span><span class="o">(</span><span class="s">&quot;prediction&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setMetricName</span><span class="o">(</span><span class="s">&quot;accuracy&quot;</span><span class="o">);</span>
@@ -985,48 +985,48 @@ We use two feature transformers to prepare the data; these help index categories
 
     <p>More details on parameters can be found in the <a href="api/python/pyspark.ml.html#pyspark.ml.classification.DecisionTreeClassifier">Python API documentation</a>.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml</span> <span class="kn">import</span> <span class="n">Pipeline</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml</span> <span class="kn">import</span> <span class="n">Pipeline</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.classification</span> <span class="kn">import</span> <span class="n">DecisionTreeClassifier</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">StringIndexer</span><span class="p">,</span> <span class="n">VectorIndexer</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.evaluation</span> <span class="kn">import</span> <span class="n">MulticlassClassificationEvaluator</span>
 
-<span class="c"># Load the data stored in LIBSVM format as a DataFrame.</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
+<span class="c1"># Load the data stored in LIBSVM format as a DataFrame.</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
 
-<span class="c"># Index labels, adding metadata to the label column.</span>
-<span class="c"># Fit on whole dataset to include all labels in index.</span>
-<span class="n">labelIndexer</span> <span class="o">=</span> <span class="n">StringIndexer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;label&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;indexedLabel&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
-<span class="c"># Automatically identify categorical features, and index them.</span>
-<span class="c"># We specify maxCategories so features with &gt; 4 distinct values are treated as continuous.</span>
+<span class="c1"># Index labels, adding metadata to the label column.</span>
+<span class="c1"># Fit on whole dataset to include all labels in index.</span>
+<span class="n">labelIndexer</span> <span class="o">=</span> <span class="n">StringIndexer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;label&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;indexedLabel&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
+<span class="c1"># Automatically identify categorical features, and index them.</span>
+<span class="c1"># We specify maxCategories so features with &gt; 4 distinct values are treated as continuous.</span>
 <span class="n">featureIndexer</span> <span class="o">=</span>\
-    <span class="n">VectorIndexer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;features&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;indexedFeatures&quot;</span><span class="p">,</span> <span class="n">maxCategories</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
+    <span class="n">VectorIndexer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;features&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;indexedFeatures&quot;</span><span class="p">,</span> <span class="n">maxCategories</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
 
-<span class="c"># Split the data into training and test sets (30% held out for testing)</span>
+<span class="c1"># Split the data into training and test sets (30% held out for testing)</span>
 <span class="p">(</span><span class="n">trainingData</span><span class="p">,</span> <span class="n">testData</span><span class="p">)</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">])</span>
 
-<span class="c"># Train a DecisionTree model.</span>
-<span class="n">dt</span> <span class="o">=</span> <span class="n">DecisionTreeClassifier</span><span class="p">(</span><span class="n">labelCol</span><span class="o">=</span><span class="s">&quot;indexedLabel&quot;</span><span class="p">,</span> <span class="n">featuresCol</span><span class="o">=</span><span class="s">&quot;indexedFeatures&quot;</span><span class="p">)</span>
+<span class="c1"># Train a DecisionTree model.</span>
+<span class="n">dt</span> <span class="o">=</span> <span class="n">DecisionTreeClassifier</span><span class="p">(</span><span class="n">labelCol</span><span class="o">=</span><span class="s2">&quot;indexedLabel&quot;</span><span class="p">,</span> <span class="n">featuresCol</span><span class="o">=</span><span class="s2">&quot;indexedFeatures&quot;</span><span class="p">)</span>
 
-<span class="c"># Chain indexers and tree in a Pipeline</span>
+<span class="c1"># Chain indexers and tree in a Pipeline</span>
 <span class="n">pipeline</span> <span class="o">=</span> <span class="n">Pipeline</span><span class="p">(</span><span class="n">stages</span><span class="o">=</span><span class="p">[</span><span class="n">labelIndexer</span><span class="p">,</span> <span class="n">featureIndexer</span><span class="p">,</span> <span class="n">dt</span><span class="p">])</span>
 
-<span class="c"># Train model.  This also runs the indexers.</span>
+<span class="c1"># Train model.  This also runs the indexers.</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">pipeline</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">trainingData</span><span class="p">)</span>
 
-<span class="c"># Make predictions.</span>
+<span class="c1"># Make predictions.</span>
 <span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">testData</span><span class="p">)</span>
 
-<span class="c"># Select example rows to display.</span>
-<span class="n">predictions</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&quot;prediction&quot;</span><span class="p">,</span> <span class="s">&quot;indexedLabel&quot;</span><span class="p">,</span> <span class="s">&quot;features&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
+<span class="c1"># Select example rows to display.</span>
+<span class="n">predictions</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">&quot;prediction&quot;</span><span class="p">,</span> <span class="s2">&quot;indexedLabel&quot;</span><span class="p">,</span> <span class="s2">&quot;features&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
 
-<span class="c"># Select (prediction, true label) and compute test error</span>
+<span class="c1"># Select (prediction, true label) and compute test error</span>
 <span class="n">evaluator</span> <span class="o">=</span> <span class="n">MulticlassClassificationEvaluator</span><span class="p">(</span>
-    <span class="n">labelCol</span><span class="o">=</span><span class="s">&quot;indexedLabel&quot;</span><span class="p">,</span> <span class="n">predictionCol</span><span class="o">=</span><span class="s">&quot;prediction&quot;</span><span class="p">,</span> <span class="n">metricName</span><span class="o">=</span><span class="s">&quot;accuracy&quot;</span><span class="p">)</span>
+    <span class="n">labelCol</span><span class="o">=</span><span class="s2">&quot;indexedLabel&quot;</span><span class="p">,</span> <span class="n">predictionCol</span><span class="o">=</span><span class="s2">&quot;prediction&quot;</span><span class="p">,</span> <span class="n">metricName</span><span class="o">=</span><span class="s2">&quot;accuracy&quot;</span><span class="p">)</span>
 <span class="n">accuracy</span> <span class="o">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">predictions</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Test Error = </span><span class="si">%g</span><span class="s"> &quot;</span> <span class="o">%</span> <span class="p">(</span><span class="mf">1.0</span> <span class="o">-</span> <span class="n">accuracy</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Test Error = </span><span class="si">%g</span><span class="s2"> &quot;</span> <span class="o">%</span> <span class="p">(</span><span class="mf">1.0</span> <span class="o">-</span> <span class="n">accuracy</span><span class="p">))</span>
 
 <span class="n">treeModel</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">stages</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
-<span class="c"># summary only</span>
+<span class="c1"># summary only</span>
 <span class="k">print</span><span class="p">(</span><span class="n">treeModel</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/ml/decision_tree_classification_example.py" in the Spark repo.</small></div>
@@ -1050,7 +1050,7 @@ We use two feature transformers to prepare the data; these help index categories
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.ml.classification.RandomForestClassifier">Scala API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.classification.</span><span class="o">{</span><span class="nc">RandomForestClassificationModel</span><span class="o">,</span> <span class="nc">RandomForestClassifier</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.</span><span class="o">{</span><span class="nc">IndexToString</span><span class="o">,</span> <span class="nc">StringIndexer</span><span class="o">,</span> <span class="nc">VectorIndexer</span><span class="o">}</span>
@@ -1118,7 +1118,7 @@ We use two feature transformers to prepare the data; these help index categories
 
     <p>Refer to the <a href="api/java/org/apache/spark/ml/classification/RandomForestClassifier.html">Java API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.PipelineModel</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.PipelineStage</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.classification.RandomForestClassificationModel</span><span class="o">;</span>
@@ -1134,13 +1134,13 @@ We use two feature transformers to prepare the data; these help index categories
 
 <span class="c1">// Index labels, adding metadata to the label column.</span>
 <span class="c1">// Fit on whole dataset to include all labels in index.</span>
-<span class="n">StringIndexerModel</span> <span class="n">labelIndexer</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StringIndexer</span><span class="o">()</span>
+<span class="n">StringIndexerModel</span> <span class="n">labelIndexer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StringIndexer</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;label&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;indexedLabel&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">fit</span><span class="o">(</span><span class="n">data</span><span class="o">);</span>
 <span class="c1">// Automatically identify categorical features, and index them.</span>
 <span class="c1">// Set maxCategories so features with &gt; 4 distinct values are treated as continuous.</span>
-<span class="n">VectorIndexerModel</span> <span class="n">featureIndexer</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">VectorIndexer</span><span class="o">()</span>
+<span class="n">VectorIndexerModel</span> <span class="n">featureIndexer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">VectorIndexer</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;features&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;indexedFeatures&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setMaxCategories</span><span class="o">(</span><span class="mi">4</span><span class="o">)</span>
@@ -1152,18 +1152,18 @@ We use two feature transformers to prepare the data; these help index categories
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">testData</span> <span class="o">=</span> <span class="n">splits</span><span class="o">[</span><span class="mi">1</span><span class="o">];</span>
 
 <span class="c1">// Train a RandomForest model.</span>
-<span class="n">RandomForestClassifier</span> <span class="n">rf</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">RandomForestClassifier</span><span class="o">()</span>
+<span class="n">RandomForestClassifier</span> <span class="n">rf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RandomForestClassifier</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setLabelCol</span><span class="o">(</span><span class="s">&quot;indexedLabel&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setFeaturesCol</span><span class="o">(</span><span class="s">&quot;indexedFeatures&quot;</span><span class="o">);</span>
 
 <span class="c1">// Convert indexed labels back to original labels.</span>
-<span class="n">IndexToString</span> <span class="n">labelConverter</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">IndexToString</span><span class="o">()</span>
+<span class="n">IndexToString</span> <span class="n">labelConverter</span> <span class="o">=</span> <span class="k">new</span> <span class="n">IndexToString</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;prediction&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;predictedLabel&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setLabels</span><span class="o">(</span><span class="n">labelIndexer</span><span class="o">.</span><span class="na">labels</span><span class="o">());</span>
 
 <span class="c1">// Chain indexers and forest in a Pipeline</span>
-<span class="n">Pipeline</span> <span class="n">pipeline</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Pipeline</span><span class="o">()</span>
+<span class="n">Pipeline</span> <span class="n">pipeline</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Pipeline</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setStages</span><span class="o">(</span><span class="k">new</span> <span class="n">PipelineStage</span><span class="o">[]</span> <span class="o">{</span><span class="n">labelIndexer</span><span class="o">,</span> <span class="n">featureIndexer</span><span class="o">,</span> <span class="n">rf</span><span class="o">,</span> <span class="n">labelConverter</span><span class="o">});</span>
 
 <span class="c1">// Train model. This also runs the indexers.</span>
@@ -1176,7 +1176,7 @@ We use two feature transformers to prepare the data; these help index categories
 <span class="n">predictions</span><span class="o">.</span><span class="na">select</span><span class="o">(</span><span class="s">&quot;predictedLabel&quot;</span><span class="o">,</span> <span class="s">&quot;label&quot;</span><span class="o">,</span> <span class="s">&quot;features&quot;</span><span class="o">).</span><span class="na">show</span><span class="o">(</span><span class="mi">5</span><span class="o">);</span>
 
 <span class="c1">// Select (prediction, true label) and compute test error</span>
-<span class="n">MulticlassClassificationEvaluator</span> <span class="n">evaluator</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">MulticlassClassificationEvaluator</span><span class="o">()</span>
+<span class="n">MulticlassClassificationEvaluator</span> <span class="n">evaluator</span> <span class="o">=</span> <span class="k">new</span> <span class="n">MulticlassClassificationEvaluator</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setLabelCol</span><span class="o">(</span><span class="s">&quot;indexedLabel&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setPredictionCol</span><span class="o">(</span><span class="s">&quot;prediction&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setMetricName</span><span class="o">(</span><span class="s">&quot;accuracy&quot;</span><span class="o">);</span>
@@ -1193,53 +1193,53 @@ We use two feature transformers to prepare the data; these help index categories
 
     <p>Refer to the <a href="api/python/pyspark.ml.html#pyspark.ml.classification.RandomForestClassifier">Python API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml</span> <span class="kn">import</span> <span class="n">Pipeline</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml</span> <span class="kn">import</span> <span class="n">Pipeline</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.classification</span> <span class="kn">import</span> <span class="n">RandomForestClassifier</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">IndexToString</span><span class="p">,</span> <span class="n">StringIndexer</span><span class="p">,</span> <span class="n">VectorIndexer</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.evaluation</span> <span class="kn">import</span> <span class="n">MulticlassClassificationEvaluator</span>
 
-<span class="c"># Load and parse the data file, converting it to a DataFrame.</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
+<span class="c1"># Load and parse the data file, converting it to a DataFrame.</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
 
-<span class="c"># Index labels, adding metadata to the label column.</span>
-<span class="c"># Fit on whole dataset to include all labels in index.</span>
-<span class="n">labelIndexer</span> <span class="o">=</span> <span class="n">StringIndexer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;label&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;indexedLabel&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
+<span class="c1"># Index labels, adding metadata to the label column.</span>
+<span class="c1"># Fit on whole dataset to include all labels in index.</span>
+<span class="n">labelIndexer</span> <span class="o">=</span> <span class="n">StringIndexer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;label&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;indexedLabel&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
 
-<span class="c"># Automatically identify categorical features, and index them.</span>
-<span class="c"># Set maxCategories so features with &gt; 4 distinct values are treated as continuous.</span>
+<span class="c1"># Automatically identify categorical features, and index them.</span>
+<span class="c1"># Set maxCategories so features with &gt; 4 distinct values are treated as continuous.</span>
 <span class="n">featureIndexer</span> <span class="o">=</span>\
-    <span class="n">VectorIndexer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;features&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;indexedFeatures&quot;</span><span class="p">,</span> <span class="n">maxCategories</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
+    <span class="n">VectorIndexer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;features&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;indexedFeatures&quot;</span><span class="p">,</span> <span class="n">maxCategories</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
 
-<span class="c"># Split the data into training and test sets (30% held out for testing)</span>
+<span class="c1"># Split the data into training and test sets (30% held out for testing)</span>
 <span class="p">(</span><span class="n">trainingData</span><span class="p">,</span> <span class="n">testData</span><span class="p">)</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">])</span>
 
-<span class="c"># Train a RandomForest model.</span>
-<span class="n">rf</span> <span class="o">=</span> <span class="n">RandomForestClassifier</span><span class="p">(</span><span class="n">labelCol</span><span class="o">=</span><span class="s">&quot;indexedLabel&quot;</span><span class="p">,</span> <span class="n">featuresCol</span><span class="o">=</span><span class="s">&quot;indexedFeatures&quot;</span><span class="p">,</span> <span class="n">numTrees</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
+<span class="c1"># Train a RandomForest model.</span>
+<span class="n">rf</span> <span class="o">=</span> <span class="n">RandomForestClassifier</span><span class="p">(</span><span class="n">labelCol</span><span class="o">=</span><span class="s2">&quot;indexedLabel&quot;</span><span class="p">,</span> <span class="n">featuresCol</span><span class="o">=</span><span class="s2">&quot;indexedFeatures&quot;</span><span class="p">,</span> <span class="n">numTrees</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
 
-<span class="c"># Convert indexed labels back to original labels.</span>
-<span class="n">labelConverter</span> <span class="o">=</span> <span class="n">IndexToString</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;prediction&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;predictedLabel&quot;</span><span class="p">,</span>
+<span class="c1"># Convert indexed labels back to original labels.</span>
+<span class="n">labelConverter</span> <span class="o">=</span> <span class="n">IndexToString</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;prediction&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;predictedLabel&quot;</span><span class="p">,</span>
                                <span class="n">labels</span><span class="o">=</span><span class="n">labelIndexer</span><span class="o">.</span><span class="n">labels</span><span class="p">)</span>
 
-<span class="c"># Chain indexers and forest in a Pipeline</span>
+<span class="c1"># Chain indexers and forest in a Pipeline</span>
 <span class="n">pipeline</span> <span class="o">=</span> <span class="n">Pipeline</span><span class="p">(</span><span class="n">stages</span><span class="o">=</span><span class="p">[</span><span class="n">labelIndexer</span><span class="p">,</span> <span class="n">featureIndexer</span><span class="p">,</span> <span class="n">rf</span><span class="p">,</span> <span class="n">labelConverter</span><span class="p">])</span>
 
-<span class="c"># Train model.  This also runs the indexers.</span>
+<span class="c1"># Train model.  This also runs the indexers.</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">pipeline</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">trainingData</span><span class="p">)</span>
 
-<span class="c"># Make predictions.</span>
+<span class="c1"># Make predictions.</span>
 <span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">testData</span><span class="p">)</span>
 
-<span class="c"># Select example rows to display.</span>
-<span class="n">predictions</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&quot;predictedLabel&quot;</span><span class="p">,</span> <span class="s">&quot;label&quot;</span><span class="p">,</span> <span class="s">&quot;features&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
+<span class="c1"># Select example rows to display.</span>
+<span class="n">predictions</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">&quot;predictedLabel&quot;</span><span class="p">,</span> <span class="s2">&quot;label&quot;</span><span class="p">,</span> <span class="s2">&quot;features&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>
 
-<span class="c"># Select (prediction, true label) and compute test error</span>
+<span class="c1"># Select (prediction, true label) and compute test error</span>
 <span class="n">evaluator</span> <span class="o">=</span> <span class="n">MulticlassClassificationEvaluator</span><span class="p">(</span>
-    <span class="n">labelCol</span><span class="o">=</span><span class="s">&quot;indexedLabel&quot;</span><span class="p">,</span> <span class="n">predictionCol</span><span class="o">=</span><span class="s">&quot;prediction&quot;</span><span class="p">,</span> <span class="n">metricName</span><span class="o">=</span><span class="s">&quot;accuracy&quot;</span><span class="p">)</span>
+    <span class="n">labelCol</span><span class="o">=</span><span class="s2">&quot;indexedLabel&quot;</span><span class="p">,</span> <span class="n">predictionCol</span><span class="o">=</span><span class="s2">&quot;prediction&quot;</span><span class="p">,</span> <span class="n">metricName</span><span class="o">=</span><span class="s2">&quot;accuracy&quot;</span><span class="p">)</span>
 <span class="n">accuracy</span> <span class="o">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">predictions</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Test Error = </span><span class="si">%g</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="mf">1.0</span> <span class="o">-</span> <span class="n">accuracy</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Test Error = </span><span class="si">%g</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="mf">1.0</span> <span class="o">-</span> <span class="n">accuracy</span><span class="p">))</span>
 
 <span class="n">rfModel</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">stages</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
-<span class="k">print</span><span class="p">(</span><span class="n">rfModel</span><span class="p">)</span>  <span class="c"># summary only</span>
+<span class="k">print</span><span class="p">(</span><span class="n">rfModel</span><span class="p">)</span>  <span class="c1"># summary only</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/ml/random_forest_classifier_example.py" in the Spark repo.</small></div>
   </div>
@@ -1248,7 +1248,7 @@ We use two feature transformers to prepare the data; these help index categories
 
     <p>Refer to the <a href="api/R/spark.randomForest.html">R API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="c1"># Load training data</span>
+    <div class="highlight"><pre><span></span><span class="c1"># Load training data</span>
 df <span class="o">&lt;-</span> read.df<span class="p">(</span><span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">,</span> <span class="kn">source</span> <span class="o">=</span> <span class="s">&quot;libsvm&quot;</span><span class="p">)</span>
 training <span class="o">&lt;-</span> df
 test <span class="o">&lt;-</span> df
@@ -1283,7 +1283,7 @@ We use two feature transformers to prepare the data; these help index categories
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.ml.classification.GBTClassifier">Scala API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.classification.</span><span class="o">{</span><span class="nc">GBTClassificationModel</span><span class="o">,</span> <span class="nc">GBTClassifier</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.</span><span class="o">{</span><span class="nc">IndexToString</span><span class="o">,</span> <span class="nc">StringIndexer</span><span class="o">,</span> <span class="nc">VectorIndexer</span><span class="o">}</span>
@@ -1351,7 +1351,7 @@ We use two feature transformers to prepare the data; these help index categories
 
     <p>Refer to the <a href="api/java/org/apache/spark/ml/classification/GBTClassifier.html">Java API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.PipelineModel</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.PipelineStage</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.classification.GBTClassificationModel</span><span class="o">;</span>
@@ -1370,13 +1370,13 @@ We use two feature transformers to prepare the data; these help index categories
 
 <span class="c1">// Index labels, adding metadata to the label column.</span>
 <span class="c1">// Fit on whole dataset to include all labels in index.</span>
-<span class="n">StringIndexerModel</span> <span class="n">labelIndexer</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StringIndexer</span><span class="o">()</span>
+<span class="n">StringIndexerModel</span> <span class="n">labelIndexer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StringIndexer</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;label&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;indexedLabel&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">fit</span><span class="o">(</span><span class="n">data</span><span class="o">);</span>
 <span class="c1">// Automatically identify categorical features, and index them.</span>
 <span class="c1">// Set maxCategories so features with &gt; 4 distinct values are treated as continuous.</span>
-<span class="n">VectorIndexerModel</span> <span class="n">featureIndexer</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">VectorIndexer</span><span class="o">()</span>
+<span class="n">VectorIndexerModel</span> <span class="n">featureIndexer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">VectorIndexer</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;features&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;indexedFeatures&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setMaxCategories</span><span class="o">(</span><span class="mi">4</span><span class="o">)</span>
@@ -1388,19 +1388,19 @@ We use two feature transformers to prepare the data; these help index categories
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">testData</span> <span class="o">=</span> <span class="n">splits</span><span class="o">[</span><span class="mi">1</span><span class="o">];</span>
 
 <span class="c1">// Train a GBT model.</span>
-<span class="n">GBTClassifier</span> <span class="n">gbt</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">GBTClassifier</span><span class="o">()</span>
+<span class="n">GBTClassifier</span> <span class="n">gbt</span> <span class="o">=</span> <span class="k">new</span> <span class="n">GBTClassifier</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setLabelCol</span><span class="o">(</span><span class="s">&quot;indexedLabel&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setFeaturesCol</span><span class="o">(</span><span class="s">&quot;indexedFeatures&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setMaxIter</span><span class="o">(</span><span class="mi">10</span><span class="o">);</span>
 
 <span class="c1">// Convert indexed labels back to original labels.</span>
-<span class="n">IndexToString</span> <span class="n">labelConverter</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">IndexToString</span><span class="o">()</span>
+<span clas

<TRUNCATED>

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[15/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-data-types.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-data-types.html b/site/docs/2.1.0/mllib-data-types.html
index 546d921..f7b5358 100644
--- a/site/docs/2.1.0/mllib-data-types.html
+++ b/site/docs/2.1.0/mllib-data-types.html
@@ -307,14 +307,14 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#local-vector" id="markdown-toc-local-vector">Local vector</a></li>
-  <li><a href="#labeled-point" id="markdown-toc-labeled-point">Labeled point</a></li>
-  <li><a href="#local-matrix" id="markdown-toc-local-matrix">Local matrix</a></li>
-  <li><a href="#distributed-matrix" id="markdown-toc-distributed-matrix">Distributed matrix</a>    <ul>
-      <li><a href="#rowmatrix" id="markdown-toc-rowmatrix">RowMatrix</a></li>
-      <li><a href="#indexedrowmatrix" id="markdown-toc-indexedrowmatrix">IndexedRowMatrix</a></li>
-      <li><a href="#coordinatematrix" id="markdown-toc-coordinatematrix">CoordinateMatrix</a></li>
-      <li><a href="#blockmatrix" id="markdown-toc-blockmatrix">BlockMatrix</a></li>
+  <li><a href="#local-vector">Local vector</a></li>
+  <li><a href="#labeled-point">Labeled point</a></li>
+  <li><a href="#local-matrix">Local matrix</a></li>
+  <li><a href="#distributed-matrix">Distributed matrix</a>    <ul>
+      <li><a href="#rowmatrix">RowMatrix</a></li>
+      <li><a href="#indexedrowmatrix">IndexedRowMatrix</a></li>
+      <li><a href="#coordinatematrix">CoordinateMatrix</a></li>
+      <li><a href="#blockmatrix">BlockMatrix</a></li>
     </ul>
   </li>
 </ul>
@@ -347,14 +347,14 @@ using the factory methods implemented in
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.linalg.Vector"><code>Vector</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$"><code>Vectors</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.</span><span class="o">{</span><span class="nc">Vector</span><span class="o">,</span> <span class="nc">Vectors</span><span class="o">}</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.</span><span class="o">{</span><span class="nc">Vector</span><span class="o">,</span> <span class="nc">Vectors</span><span class="o">}</span>
 
 <span class="c1">// Create a dense vector (1.0, 0.0, 3.0).</span>
 <span class="k">val</span> <span class="n">dv</span><span class="k">:</span> <span class="kt">Vector</span> <span class="o">=</span> <span class="nc">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">)</span>
 <span class="c1">// Create a sparse vector (1.0, 0.0, 3.0) by specifying its indices and values corresponding to nonzero entries.</span>
 <span class="k">val</span> <span class="n">sv1</span><span class="k">:</span> <span class="kt">Vector</span> <span class="o">=</span> <span class="nc">Vectors</span><span class="o">.</span><span class="n">sparse</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="mi">2</span><span class="o">),</span> <span class="nc">Array</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">))</span>
 <span class="c1">// Create a sparse vector (1.0, 0.0, 3.0) by specifying its nonzero entries.</span>
-<span class="k">val</span> <span class="n">sv2</span><span class="k">:</span> <span class="kt">Vector</span> <span class="o">=</span> <span class="nc">Vectors</span><span class="o">.</span><span class="n">sparse</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="nc">Seq</span><span class="o">((</span><span class="mi">0</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span> <span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">)))</span></code></pre></div>
+<span class="k">val</span> <span class="n">sv2</span><span class="k">:</span> <span class="kt">Vector</span> <span class="o">=</span> <span class="nc">Vectors</span><span class="o">.</span><span class="n">sparse</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="nc">Seq</span><span class="o">((</span><span class="mi">0</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span> <span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">)))</span></code></pre></figure>
 
     <p><strong><em>Note:</em></strong>
 Scala imports <code>scala.collection.immutable.Vector</code> by default, so you have to import
@@ -373,13 +373,13 @@ using the factory methods implemented in
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/linalg/Vector.html"><code>Vector</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/linalg/Vectors.html"><code>Vectors</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vector</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vector</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span><span class="o">;</span>
 
 <span class="c1">// Create a dense vector (1.0, 0.0, 3.0).</span>
 <span class="n">Vector</span> <span class="n">dv</span> <span class="o">=</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">);</span>
 <span class="c1">// Create a sparse vector (1.0, 0.0, 3.0) by specifying its indices and values corresponding to nonzero entries.</span>
-<span class="n">Vector</span> <span class="n">sv</span> <span class="o">=</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">sparse</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="k">new</span> <span class="kt">int</span><span class="o">[]</span> <span class="o">{</span><span class="mi">0</span><span class="o">,</span> <span class="mi">2</span><span class="o">},</span> <span class="k">new</span> <span class="kt">double</span><span class="o">[]</span> <span class="o">{</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">});</span></code></pre></div>
+<span class="n">Vector</span> <span class="n">sv</span> <span class="o">=</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">sparse</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="k">new</span> <span class="kt">int</span><span class="o">[]</span> <span class="o">{</span><span class="mi">0</span><span class="o">,</span> <span class="mi">2</span><span class="o">},</span> <span class="k">new</span> <span class="kt">double</span><span class="o">[]</span> <span class="o">{</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">});</span></code></pre></figure>
 
   </div>
 
@@ -405,18 +405,18 @@ in <a href="api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vectors"><code>Ve
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.linalg.Vectors"><code>Vectors</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
 <span class="kn">import</span> <span class="nn">scipy.sparse</span> <span class="kn">as</span> <span class="nn">sps</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Vectors</span>
 
-<span class="c"># Use a NumPy array as a dense vector.</span>
+<span class="c1"># Use a NumPy array as a dense vector.</span>
 <span class="n">dv1</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">])</span>
-<span class="c"># Use a Python list as a dense vector.</span>
+<span class="c1"># Use a Python list as a dense vector.</span>
 <span class="n">dv2</span> <span class="o">=</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">]</span>
-<span class="c"># Create a SparseVector.</span>
+<span class="c1"># Create a SparseVector.</span>
 <span class="n">sv1</span> <span class="o">=</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">sparse</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">])</span>
-<span class="c"># Use a single-column SciPy csc_matrix as a sparse vector.</span>
-<span class="n">sv2</span> <span class="o">=</span> <span class="n">sps</span><span class="o">.</span><span class="n">csc_matrix</span><span class="p">((</span><span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">]),</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">]),</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">])),</span> <span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span></code></pre></div>
+<span class="c1"># Use a single-column SciPy csc_matrix as a sparse vector.</span>
+<span class="n">sv2</span> <span class="o">=</span> <span class="n">sps</span><span class="o">.</span><span class="n">csc_matrix</span><span class="p">((</span><span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">]),</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">]),</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">])),</span> <span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span></code></pre></figure>
 
   </div>
 </div>
@@ -438,14 +438,14 @@ For multiclass classification, labels should be class indices starting from zero
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.regression.LabeledPoint"><code>LabeledPoint</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span>
 
 <span class="c1">// Create a labeled point with a positive label and a dense feature vector.</span>
 <span class="k">val</span> <span class="n">pos</span> <span class="k">=</span> <span class="nc">LabeledPoint</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="nc">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">))</span>
 
 <span class="c1">// Create a labeled point with a negative label and a sparse feature vector.</span>
-<span class="k">val</span> <span class="n">neg</span> <span class="k">=</span> <span class="nc">LabeledPoint</span><span class="o">(</span><span class="mf">0.0</span><span class="o">,</span> <span class="nc">Vectors</span><span class="o">.</span><span class="n">sparse</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="mi">2</span><span class="o">),</span> <span class="nc">Array</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">)))</span></code></pre></div>
+<span class="k">val</span> <span class="n">neg</span> <span class="k">=</span> <span class="nc">LabeledPoint</span><span class="o">(</span><span class="mf">0.0</span><span class="o">,</span> <span class="nc">Vectors</span><span class="o">.</span><span class="n">sparse</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="mi">2</span><span class="o">),</span> <span class="nc">Array</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">)))</span></code></pre></figure>
 
   </div>
 
@@ -456,14 +456,14 @@ For multiclass classification, labels should be class indices starting from zero
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/regression/LabeledPoint.html"><code>LabeledPoint</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span><span class="o">;</span>
 
 <span class="c1">// Create a labeled point with a positive label and a dense feature vector.</span>
-<span class="n">LabeledPoint</span> <span class="n">pos</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LabeledPoint</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">));</span>
+<span class="n">LabeledPoint</span> <span class="n">pos</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LabeledPoint</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">));</span>
 
 <span class="c1">// Create a labeled point with a negative label and a sparse feature vector.</span>
-<span class="n">LabeledPoint</span> <span class="n">neg</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LabeledPoint</span><span class="o">(</span><span class="mf">0.0</span><span class="o">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">sparse</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="k">new</span> <span class="kt">int</span><span class="o">[]</span> <span class="o">{</span><span class="mi">0</span><span class="o">,</span> <span class="mi">2</span><span class="o">},</span> <span class="k">new</span> <span class="kt">double</span><span class="o">[]</span> <span class="o">{</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">}));</span></code></pre></div>
+<span class="n">LabeledPoint</span> <span class="n">neg</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LabeledPoint</span><span class="o">(</span><span class="mf">0.0</span><span class="o">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">sparse</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="k">new</span> <span class="kt">int</span><span class="o">[]</span> <span class="o">{</span><span class="mi">0</span><span class="o">,</span> <span class="mi">2</span><span class="o">},</span> <span class="k">new</span> <span class="kt">double</span><span class="o">[]</span> <span class="o">{</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">}));</span></code></pre></figure>
 
   </div>
 
@@ -474,14 +474,14 @@ For multiclass classification, labels should be class indices starting from zero
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.regression.LabeledPoint"><code>LabeledPoint</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">SparseVector</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">SparseVector</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.regression</span> <span class="kn">import</span> <span class="n">LabeledPoint</span>
 
-<span class="c"># Create a labeled point with a positive label and a dense feature vector.</span>
+<span class="c1"># Create a labeled point with a positive label and a dense feature vector.</span>
 <span class="n">pos</span> <span class="o">=</span> <span class="n">LabeledPoint</span><span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">])</span>
 
-<span class="c"># Create a labeled point with a negative label and a sparse feature vector.</span>
-<span class="n">neg</span> <span class="o">=</span> <span class="n">LabeledPoint</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">SparseVector</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">]))</span></code></pre></div>
+<span class="c1"># Create a labeled point with a negative label and a sparse feature vector.</span>
+<span class="n">neg</span> <span class="o">=</span> <span class="n">LabeledPoint</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">SparseVector</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">]))</span></code></pre></figure>
 
   </div>
 </div>
@@ -508,11 +508,11 @@ examples stored in LIBSVM format.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.util.MLUtils$"><code>MLUtils</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.rdd.RDD</span>
 
-<span class="k">val</span> <span class="n">examples</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">LabeledPoint</span><span class="o">]</span> <span class="k">=</span> <span class="nc">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">examples</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">LabeledPoint</span><span class="o">]</span> <span class="k">=</span> <span class="nc">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="o">)</span></code></pre></figure>
 
   </div>
 
@@ -522,12 +522,12 @@ examples stored in LIBSVM format.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/util/MLUtils.html"><code>MLUtils</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">LabeledPoint</span><span class="o">&gt;</span> <span class="n">examples</span> <span class="o">=</span> 
-  <span class="n">MLUtils</span><span class="o">.</span><span class="na">loadLibSVMFile</span><span class="o">(</span><span class="n">jsc</span><span class="o">.</span><span class="na">sc</span><span class="o">(),</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="o">).</span><span class="na">toJavaRDD</span><span class="o">();</span></code></pre></div>
+  <span class="n">MLUtils</span><span class="o">.</span><span class="na">loadLibSVMFile</span><span class="o">(</span><span class="n">jsc</span><span class="o">.</span><span class="na">sc</span><span class="o">(),</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="o">).</span><span class="na">toJavaRDD</span><span class="o">();</span></code></pre></figure>
 
   </div>
 
@@ -537,9 +537,9 @@ examples stored in LIBSVM format.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.util.MLUtils"><code>MLUtils</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span>
 
-<span class="n">examples</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span></code></pre></div>
+<span class="n">examples</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span></code></pre></figure>
 
   </div>
 </div>
@@ -570,13 +570,13 @@ matrices. Remember, local matrices in MLlib are stored in column-major order.</p
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.linalg.Matrix"><code>Matrix</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.linalg.Matrices$"><code>Matrices</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.</span><span class="o">{</span><span class="nc">Matrix</span><span class="o">,</span> <span class="nc">Matrices</span><span class="o">}</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.</span><span class="o">{</span><span class="nc">Matrix</span><span class="o">,</span> <span class="nc">Matrices</span><span class="o">}</span>
 
 <span class="c1">// Create a dense matrix ((1.0, 2.0), (3.0, 4.0), (5.0, 6.0))</span>
 <span class="k">val</span> <span class="n">dm</span><span class="k">:</span> <span class="kt">Matrix</span> <span class="o">=</span> <span class="nc">Matrices</span><span class="o">.</span><span class="n">dense</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="nc">Array</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">,</span> <span class="mf">5.0</span><span class="o">,</span> <span class="mf">2.0</span><span class="o">,</span> <span class="mf">4.0</span><span class="o">,</span> <span class="mf">6.0</span><span class="o">))</span>
 
 <span class="c1">// Create a sparse matrix ((9.0, 0.0), (0.0, 8.0), (0.0, 6.0))</span>
-<span class="k">val</span> <span class="n">sm</span><span class="k">:</span> <span class="kt">Matrix</span> <span class="o">=</span> <span class="nc">Matrices</span><span class="o">.</span><span class="n">sparse</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="mi">1</span><span class="o">,</span> <span class="mi">3</span><span class="o">),</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">1</span><span class="o">),</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">9</span><span class="o">,</span> <span class="mi">6</span><span class="o">,</span> <span class="mi">8</span><span class="o">))</span></code></pre></div>
+<span class="k">val</span> <span class="n">sm</span><span class="k">:</span> <span class="kt">Matrix</span> <span class="o">=</span> <span class="nc">Matrices</span><span class="o">.</span><span class="n">sparse</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="mi">1</span><span class="o">,</span> <span class="mi">3</span><span class="o">),</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">1</span><span class="o">),</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">9</span><span class="o">,</span> <span class="mi">6</span><span class="o">,</span> <span class="mi">8</span><span class="o">))</span></code></pre></figure>
 
   </div>
 
@@ -592,14 +592,14 @@ matrices. Remember, local matrices in MLlib are stored in column-major order.</p
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/linalg/Matrix.html"><code>Matrix</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/linalg/Matrices.html"><code>Matrices</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.Matrix</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.Matrix</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.Matrices</span><span class="o">;</span>
 
 <span class="c1">// Create a dense matrix ((1.0, 2.0), (3.0, 4.0), (5.0, 6.0))</span>
 <span class="n">Matrix</span> <span class="n">dm</span> <span class="o">=</span> <span class="n">Matrices</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="k">new</span> <span class="kt">double</span><span class="o">[]</span> <span class="o">{</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">,</span> <span class="mf">5.0</span><span class="o">,</span> <span class="mf">2.0</span><span class="o">,</span> <span class="mf">4.0</span><span class="o">,</span> <span class="mf">6.0</span><span class="o">});</span>
 
 <span class="c1">// Create a sparse matrix ((9.0, 0.0), (0.0, 8.0), (0.0, 6.0))</span>
-<span class="n">Matrix</span> <span class="n">sm</span> <span class="o">=</span> <span class="n">Matrices</span><span class="o">.</span><span class="na">sparse</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="k">new</span> <span class="kt">int</span><span class="o">[]</span> <span class="o">{</span><span class="mi">0</span><span class="o">,</span> <span class="mi">1</span><span class="o">,</span> <span class="mi">3</span><span class="o">},</span> <span class="k">new</span> <span class="kt">int</span><span class="o">[]</span> <span class="o">{</span><span class="mi">0</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">1</span><span class="o">},</span> <span class="k">new</span> <span class="kt">double</span><span class="o">[]</span> <span class="o">{</span><span class="mi">9</span><span class="o">,</span> <span class="mi">6</span><span class="o"
 >,</span> <span class="mi">8</span><span class="o">});</span></code></pre></div>
+<span class="n">Matrix</span> <span class="n">sm</span> <span class="o">=</span> <span class="n">Matrices</span><span class="o">.</span><span class="na">sparse</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="k">new</span> <span class="kt">int</span><span class="o">[]</span> <span class="o">{</span><span class="mi">0</span><span class="o">,</span> <span class="mi">1</span><span class="o">,</span> <span class="mi">3</span><span class="o">},</span> <span class="k">new</span> <span class="kt">int</span><span class="o">[]</span> <span class="o">{</span><span class="mi">0</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">1</span><span class="o">},</span> <span class="k">new</span> <span class="kt">double</span><span class="o">[]</span> <span class="o">{</span><span class="mi">9</span><span class="o">,</span> <span class="mi">6</span><span class="o"
 >,</span> <span class="mi">8</span><span class="o">});</span></code></pre></figure>
 
   </div>
 
@@ -615,13 +615,13 @@ matrices. Remember, local matrices in MLlib are stored in column-major order.</p
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.linalg.Matrix"><code>Matrix</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.linalg.Matrices"><code>Matrices</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Matrix</span><span class="p">,</span> <span class="n">Matrices</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Matrix</span><span class="p">,</span> <span class="n">Matrices</span>
 
-<span class="c"># Create a dense matrix ((1.0, 2.0), (3.0, 4.0), (5.0, 6.0))</span>
+<span class="c1"># Create a dense matrix ((1.0, 2.0), (3.0, 4.0), (5.0, 6.0))</span>
 <span class="n">dm2</span> <span class="o">=</span> <span class="n">Matrices</span><span class="o">.</span><span class="n">dense</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">])</span>
 
-<span class="c"># Create a sparse matrix ((9.0, 0.0), (0.0, 8.0), (0.0, 6.0))</span>
-<span class="n">sm</span> <span class="o">=</span> <span class="n">Matrices</span><span class="o">.</span><span class="n">sparse</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">9</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">8</span><span class="p">])</span></code></pre></div>
+<span class="c1"># Create a sparse matrix ((9.0, 0.0), (0.0, 8.0), (0.0, 6.0))</span>
+<span class="n">sm</span> <span class="o">=</span> <span class="n">Matrices</span><span class="o">.</span><span class="n">sparse</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">9</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">8</span><span class="p">])</span></code></pre></figure>
 
   </div>
 
@@ -670,7 +670,7 @@ For <a href="https://en.wikipedia.org/wiki/Singular_value_decomposition">singula
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.linalg.distributed.RowMatrix"><code>RowMatrix</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vector</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vector</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.RowMatrix</span>
 
 <span class="k">val</span> <span class="n">rows</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">Vector</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span> <span class="c1">// an RDD of local vectors</span>
@@ -682,7 +682,7 @@ For <a href="https://en.wikipedia.org/wiki/Singular_value_decomposition">singula
 <span class="k">val</span> <span class="n">n</span> <span class="k">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numCols</span><span class="o">()</span>
 
 <span class="c1">// QR decomposition </span>
-<span class="k">val</span> <span class="n">qrResult</span> <span class="k">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">tallSkinnyQR</span><span class="o">(</span><span class="kc">true</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">qrResult</span> <span class="k">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">tallSkinnyQR</span><span class="o">(</span><span class="kc">true</span><span class="o">)</span></code></pre></figure>
 
   </div>
 
@@ -693,20 +693,20 @@ created from a <code>JavaRDD&lt;Vector&gt;</code> instance.  Then we can compute
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/linalg/distributed/RowMatrix.html"><code>RowMatrix</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vector</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.RowMatrix</span><span class="o">;</span>
 
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Vector</span><span class="o">&gt;</span> <span class="n">rows</span> <span class="o">=</span> <span class="o">...</span> <span class="c1">// a JavaRDD of local vectors</span>
 <span class="c1">// Create a RowMatrix from an JavaRDD&lt;Vector&gt;.</span>
-<span class="n">RowMatrix</span> <span class="n">mat</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">RowMatrix</span><span class="o">(</span><span class="n">rows</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
+<span class="n">RowMatrix</span> <span class="n">mat</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RowMatrix</span><span class="o">(</span><span class="n">rows</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 
 <span class="c1">// Get its size.</span>
 <span class="kt">long</span> <span class="n">m</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="na">numRows</span><span class="o">();</span>
 <span class="kt">long</span> <span class="n">n</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="na">numCols</span><span class="o">();</span>
 
 <span class="c1">// QR decomposition </span>
-<span class="n">QRDecomposition</span><span class="o">&lt;</span><span class="n">RowMatrix</span><span class="o">,</span> <span class="n">Matrix</span><span class="o">&gt;</span> <span class="n">result</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="na">tallSkinnyQR</span><span class="o">(</span><span class="kc">true</span><span class="o">);</span></code></pre></div>
+<span class="n">QRDecomposition</span><span class="o">&lt;</span><span class="n">RowMatrix</span><span class="o">,</span> <span class="n">Matrix</span><span class="o">&gt;</span> <span class="n">result</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="na">tallSkinnyQR</span><span class="o">(</span><span class="kc">true</span><span class="o">);</span></code></pre></figure>
 
   </div>
 
@@ -717,20 +717,20 @@ created from an <code>RDD</code> of vectors.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.linalg.distributed.RowMatrix"><code>RowMatrix</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg.distributed</span> <span class="kn">import</span> <span class="n">RowMatrix</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg.distributed</span> <span class="kn">import</span> <span class="n">RowMatrix</span>
 
-<span class="c"># Create an RDD of vectors.</span>
+<span class="c1"># Create an RDD of vectors.</span>
 <span class="n">rows</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">],</span> <span class="p">[</span><span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">],</span> <span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">11</span><span class="p">,</span> <span class="mi">12</span><span class="p">]])</span>
 
-<span class="c"># Create a RowMatrix from an RDD of vectors.</span>
+<span class="c1"># Create a RowMatrix from an RDD of vectors.</span>
 <span class="n">mat</span> <span class="o">=</span> <span class="n">RowMatrix</span><span class="p">(</span><span class="n">rows</span><span class="p">)</span>
 
-<span class="c"># Get its size.</span>
-<span class="n">m</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numRows</span><span class="p">()</span>  <span class="c"># 4</span>
-<span class="n">n</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numCols</span><span class="p">()</span>  <span class="c"># 3</span>
+<span class="c1"># Get its size.</span>
+<span class="n">m</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numRows</span><span class="p">()</span>  <span class="c1"># 4</span>
+<span class="n">n</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numCols</span><span class="p">()</span>  <span class="c1"># 3</span>
 
-<span class="c"># Get the rows as an RDD of vectors again.</span>
-<span class="n">rowsRDD</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">rows</span></code></pre></div>
+<span class="c1"># Get the rows as an RDD of vectors again.</span>
+<span class="n">rowsRDD</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">rows</span></code></pre></figure>
 
   </div>
 
@@ -754,7 +754,7 @@ its row indices.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix"><code>IndexedRowMatrix</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.</span><span class="o">{</span><span class="nc">IndexedRow</span><span class="o">,</span> <span class="nc">IndexedRowMatrix</span><span class="o">,</span> <span class="nc">RowMatrix</span><span class="o">}</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.</span><span class="o">{</span><span class="nc">IndexedRow</span><span class="o">,</span> <span class="nc">IndexedRowMatrix</span><span class="o">,</span> <span class="nc">RowMatrix</span><span class="o">}</span>
 
 <span class="k">val</span> <span class="n">rows</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">IndexedRow</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span> <span class="c1">// an RDD of indexed rows</span>
 <span class="c1">// Create an IndexedRowMatrix from an RDD[IndexedRow].</span>
@@ -765,7 +765,7 @@ its row indices.</p>
 <span class="k">val</span> <span class="n">n</span> <span class="k">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numCols</span><span class="o">()</span>
 
 <span class="c1">// Drop its row indices.</span>
-<span class="k">val</span> <span class="n">rowMat</span><span class="k">:</span> <span class="kt">RowMatrix</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">toRowMatrix</span><span class="o">()</span></code></pre></div>
+<span class="k">val</span> <span class="n">rowMat</span><span class="k">:</span> <span class="kt">RowMatrix</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">toRowMatrix</span><span class="o">()</span></code></pre></figure>
 
   </div>
 
@@ -780,21 +780,21 @@ its row indices.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/linalg/distributed/IndexedRowMatrix.html"><code>IndexedRowMatrix</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.IndexedRow</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.RowMatrix</span><span class="o">;</span>
 
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">IndexedRow</span><span class="o">&gt;</span> <span class="n">rows</span> <span class="o">=</span> <span class="o">...</span> <span class="c1">// a JavaRDD of indexed rows</span>
 <span class="c1">// Create an IndexedRowMatrix from a JavaRDD&lt;IndexedRow&gt;.</span>
-<span class="n">IndexedRowMatrix</span> <span class="n">mat</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">IndexedRowMatrix</span><span class="o">(</span><span class="n">rows</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
+<span class="n">IndexedRowMatrix</span> <span class="n">mat</span> <span class="o">=</span> <span class="k">new</span> <span class="n">IndexedRowMatrix</span><span class="o">(</span><span class="n">rows</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 
 <span class="c1">// Get its size.</span>
 <span class="kt">long</span> <span class="n">m</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="na">numRows</span><span class="o">();</span>
 <span class="kt">long</span> <span class="n">n</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="na">numCols</span><span class="o">();</span>
 
 <span class="c1">// Drop its row indices.</span>
-<span class="n">RowMatrix</span> <span class="n">rowMat</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="na">toRowMatrix</span><span class="o">();</span></code></pre></div>
+<span class="n">RowMatrix</span> <span class="n">rowMat</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="na">toRowMatrix</span><span class="o">();</span></code></pre></figure>
 
   </div>
 
@@ -808,30 +808,30 @@ its row indices.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.linalg.distributed.IndexedRowMatrix"><code>IndexedRowMatrix</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg.distributed</span> <span class="kn">import</span> <span class="n">IndexedRow</span><span class="p">,</span> <span class="n">IndexedRowMatrix</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg.distributed</span> <span class="kn">import</span> <span class="n">IndexedRow</span><span class="p">,</span> <span class="n">IndexedRowMatrix</span>
 
-<span class="c"># Create an RDD of indexed rows.</span>
-<span class="c">#   - This can be done explicitly with the IndexedRow class:</span>
+<span class="c1"># Create an RDD of indexed rows.</span>
+<span class="c1">#   - This can be done explicitly with the IndexedRow class:</span>
 <span class="n">indexedRows</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([</span><span class="n">IndexedRow</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]),</span>
                               <span class="n">IndexedRow</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]),</span>
                               <span class="n">IndexedRow</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="p">[</span><span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">]),</span>
                               <span class="n">IndexedRow</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">11</span><span class="p">,</span> <span class="mi">12</span><span class="p">])])</span>
-<span class="c">#   - or by using (long, vector) tuples:</span>
+<span class="c1">#   - or by using (long, vector) tuples:</span>
 <span class="n">indexedRows</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([(</span><span class="mi">0</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]),</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">]),</span>
                               <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="p">[</span><span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">]),</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">11</span><span class="p">,</span> <span class="mi">12</span><span class="p">])])</span>
 
-<span class="c"># Create an IndexedRowMatrix from an RDD of IndexedRows.</span>
+<span class="c1"># Create an IndexedRowMatrix from an RDD of IndexedRows.</span>
 <span class="n">mat</span> <span class="o">=</span> <span class="n">IndexedRowMatrix</span><span class="p">(</span><span class="n">indexedRows</span><span class="p">)</span>
 
-<span class="c"># Get its size.</span>
-<span class="n">m</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numRows</span><span class="p">()</span>  <span class="c"># 4</span>
-<span class="n">n</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numCols</span><span class="p">()</span>  <span class="c"># 3</span>
+<span class="c1"># Get its size.</span>
+<span class="n">m</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numRows</span><span class="p">()</span>  <span class="c1"># 4</span>
+<span class="n">n</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numCols</span><span class="p">()</span>  <span class="c1"># 3</span>
 
-<span class="c"># Get the rows as an RDD of IndexedRows.</span>
+<span class="c1"># Get the rows as an RDD of IndexedRows.</span>
 <span class="n">rowsRDD</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">rows</span>
 
-<span class="c"># Convert to a RowMatrix by dropping the row indices.</span>
-<span class="n">rowMat</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">toRowMatrix</span><span class="p">()</span></code></pre></div>
+<span class="c1"># Convert to a RowMatrix by dropping the row indices.</span>
+<span class="n">rowMat</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">toRowMatrix</span><span class="p">()</span></code></pre></figure>
 
   </div>
 
@@ -857,7 +857,7 @@ with sparse rows by calling <code>toIndexedRowMatrix</code>.  Other computations
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.linalg.distributed.CoordinateMatrix"><code>CoordinateMatrix</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.</span><span class="o">{</span><span class="nc">CoordinateMatrix</span><span class="o">,</span> <span class="nc">MatrixEntry</span><span class="o">}</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.</span><span class="o">{</span><span class="nc">CoordinateMatrix</span><span class="o">,</span> <span class="nc">MatrixEntry</span><span class="o">}</span>
 
 <span class="k">val</span> <span class="n">entries</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">MatrixEntry</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span> <span class="c1">// an RDD of matrix entries</span>
 <span class="c1">// Create a CoordinateMatrix from an RDD[MatrixEntry].</span>
@@ -868,7 +868,7 @@ with sparse rows by calling <code>toIndexedRowMatrix</code>.  Other computations
 <span class="k">val</span> <span class="n">n</span> <span class="k">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numCols</span><span class="o">()</span>
 
 <span class="c1">// Convert it to an IndexRowMatrix whose rows are sparse vectors.</span>
-<span class="k">val</span> <span class="n">indexedRowMatrix</span> <span class="k">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">toIndexedRowMatrix</span><span class="o">()</span></code></pre></div>
+<span class="k">val</span> <span class="n">indexedRowMatrix</span> <span class="k">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">toIndexedRowMatrix</span><span class="o">()</span></code></pre></figure>
 
   </div>
 
@@ -884,21 +884,21 @@ with sparse rows by calling <code>toIndexedRowMatrix</code>. Other computations
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/linalg/distributed/CoordinateMatrix.html"><code>CoordinateMatrix</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.CoordinateMatrix</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.MatrixEntry</span><span class="o">;</span>
 
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">MatrixEntry</span><span class="o">&gt;</span> <span class="n">entries</span> <span class="o">=</span> <span class="o">...</span> <span class="c1">// a JavaRDD of matrix entries</span>
 <span class="c1">// Create a CoordinateMatrix from a JavaRDD&lt;MatrixEntry&gt;.</span>
-<span class="n">CoordinateMatrix</span> <span class="n">mat</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">CoordinateMatrix</span><span class="o">(</span><span class="n">entries</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
+<span class="n">CoordinateMatrix</span> <span class="n">mat</span> <span class="o">=</span> <span class="k">new</span> <span class="n">CoordinateMatrix</span><span class="o">(</span><span class="n">entries</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 
 <span class="c1">// Get its size.</span>
 <span class="kt">long</span> <span class="n">m</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="na">numRows</span><span class="o">();</span>
 <span class="kt">long</span> <span class="n">n</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="na">numCols</span><span class="o">();</span>
 
 <span class="c1">// Convert it to an IndexRowMatrix whose rows are sparse vectors.</span>
-<span class="n">IndexedRowMatrix</span> <span class="n">indexedRowMatrix</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="na">toIndexedRowMatrix</span><span class="o">();</span></code></pre></div>
+<span class="n">IndexedRowMatrix</span> <span class="n">indexedRowMatrix</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="na">toIndexedRowMatrix</span><span class="o">();</span></code></pre></figure>
 
   </div>
 
@@ -912,32 +912,32 @@ calling <code>toRowMatrix</code>, or to an <code>IndexedRowMatrix</code> with sp
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.linalg.distributed.CoordinateMatrix"><code>CoordinateMatrix</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg.distributed</span> <span class="kn">import</span> <span class="n">CoordinateMatrix</span><span class="p">,</span> <span class="n">MatrixEntry</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg.distributed</span> <span class="kn">import</span> <span class="n">CoordinateMatrix</span><span class="p">,</span> <span class="n">MatrixEntry</span>
 
-<span class="c"># Create an RDD of coordinate entries.</span>
-<span class="c">#   - This can be done explicitly with the MatrixEntry class:</span>
+<span class="c1"># Create an RDD of coordinate entries.</span>
+<span class="c1">#   - This can be done explicitly with the MatrixEntry class:</span>
 <span class="n">entries</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([</span><span class="n">MatrixEntry</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mf">1.2</span><span class="p">),</span> <span class="n">MatrixEntry</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mf">2.1</span><span class="p">),</span> <span class="n">MatrixEntry</span><span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mf">3.7</span><span class="p">)])</span>
-<span class="c">#   - or using (long, long, float) tuples:</span>
+<span class="c1">#   - or using (long, long, float) tuples:</span>
 <span class="n">entries</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mf">1.2</span><span class="p">),</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mf">2.1</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mf">3.7</span><span class="p">)])</span>
 
-<span class="c"># Create an CoordinateMatrix from an RDD of MatrixEntries.</span>
+<span class="c1"># Create an CoordinateMatrix from an RDD of MatrixEntries.</span>
 <span class="n">mat</span> <span class="o">=</span> <span class="n">CoordinateMatrix</span><span class="p">(</span><span class="n">entries</span><span class="p">)</span>
 
-<span class="c"># Get its size.</span>
-<span class="n">m</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numRows</span><span class="p">()</span>  <span class="c"># 3</span>
-<span class="n">n</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numCols</span><span class="p">()</span>  <span class="c"># 2</span>
+<span class="c1"># Get its size.</span>
+<span class="n">m</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numRows</span><span class="p">()</span>  <span class="c1"># 3</span>
+<span class="n">n</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numCols</span><span class="p">()</span>  <span class="c1"># 2</span>
 
-<span class="c"># Get the entries as an RDD of MatrixEntries.</span>
+<span class="c1"># Get the entries as an RDD of MatrixEntries.</span>
 <span class="n">entriesRDD</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">entries</span>
 
-<span class="c"># Convert to a RowMatrix.</span>
+<span class="c1"># Convert to a RowMatrix.</span>
 <span class="n">rowMat</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">toRowMatrix</span><span class="p">()</span>
 
-<span class="c"># Convert to an IndexedRowMatrix.</span>
+<span class="c1"># Convert to an IndexedRowMatrix.</span>
 <span class="n">indexedRowMat</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">toIndexedRowMatrix</span><span class="p">()</span>
 
-<span class="c"># Convert to a BlockMatrix.</span>
-<span class="n">blockMat</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">toBlockMatrix</span><span class="p">()</span></code></pre></div>
+<span class="c1"># Convert to a BlockMatrix.</span>
+<span class="n">blockMat</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">toBlockMatrix</span><span class="p">()</span></code></pre></figure>
 
   </div>
 
@@ -962,7 +962,7 @@ Users may change the block size by supplying the values through <code>toBlockMat
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.linalg.distributed.BlockMatrix"><code>BlockMatrix</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.</span><span class="o">{</span><span class="nc">BlockMatrix</span><span class="o">,</span> <span class="nc">CoordinateMatrix</span><span class="o">,</span> <span class="nc">MatrixEntry</span><span class="o">}</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.</span><span class="o">{</span><span class="nc">BlockMatrix</span><span class="o">,</span> <span class="nc">CoordinateMatrix</span><span class="o">,</span> <span class="nc">MatrixEntry</span><span class="o">}</span>
 
 <span class="k">val</span> <span class="n">entries</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">MatrixEntry</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span> <span class="c1">// an RDD of (i, j, v) matrix entries</span>
 <span class="c1">// Create a CoordinateMatrix from an RDD[MatrixEntry].</span>
@@ -975,7 +975,7 @@ Users may change the block size by supplying the values through <code>toBlockMat
 <span class="n">matA</span><span class="o">.</span><span class="n">validate</span><span class="o">()</span>
 
 <span class="c1">// Calculate A^T A.</span>
-<span class="k">val</span> <span class="n">ata</span> <span class="k">=</span> <span class="n">matA</span><span class="o">.</span><span class="n">transpose</span><span class="o">.</span><span class="n">multiply</span><span class="o">(</span><span class="n">matA</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">ata</span> <span class="k">=</span> <span class="n">matA</span><span class="o">.</span><span class="n">transpose</span><span class="o">.</span><span class="n">multiply</span><span class="o">(</span><span class="n">matA</span><span class="o">)</span></code></pre></figure>
 
   </div>
 
@@ -988,14 +988,14 @@ Users may change the block size by supplying the values through <code>toBlockMat
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/linalg/distributed/BlockMatrix.html"><code>BlockMatrix</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.BlockMatrix</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.CoordinateMatrix</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.IndexedRowMatrix</span><span class="o">;</span>
 
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">MatrixEntry</span><span class="o">&gt;</span> <span class="n">entries</span> <span class="o">=</span> <span class="o">...</span> <span class="c1">// a JavaRDD of (i, j, v) Matrix Entries</span>
 <span class="c1">// Create a CoordinateMatrix from a JavaRDD&lt;MatrixEntry&gt;.</span>
-<span class="n">CoordinateMatrix</span> <span class="n">coordMat</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">CoordinateMatrix</span><span class="o">(</span><span class="n">entries</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
+<span class="n">CoordinateMatrix</span> <span class="n">coordMat</span> <span class="o">=</span> <span class="k">new</span> <span class="n">CoordinateMatrix</span><span class="o">(</span><span class="n">entries</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 <span class="c1">// Transform the CoordinateMatrix to a BlockMatrix</span>
 <span class="n">BlockMatrix</span> <span class="n">matA</span> <span class="o">=</span> <span class="n">coordMat</span><span class="o">.</span><span class="na">toBlockMatrix</span><span class="o">().</span><span class="na">cache</span><span class="o">();</span>
 
@@ -1004,7 +1004,7 @@ Users may change the block size by supplying the values through <code>toBlockMat
 <span class="n">matA</span><span class="o">.</span><span class="na">validate</span><span class="o">();</span>
 
 <span class="c1">// Calculate A^T A.</span>
-<span class="n">BlockMatrix</span> <span class="n">ata</span> <span class="o">=</span> <span class="n">matA</span><span class="o">.</span><span class="na">transpose</span><span class="o">().</span><span class="na">multiply</span><span class="o">(</span><span class="n">matA</span><span class="o">);</span></code></pre></div>
+<span class="n">BlockMatrix</span> <span class="n">ata</span> <span class="o">=</span> <span class="n">matA</span><span class="o">.</span><span class="na">transpose</span><span class="o">().</span><span class="na">multiply</span><span class="o">(</span><span class="n">matA</span><span class="o">);</span></code></pre></figure>
 
   </div>
 
@@ -1016,31 +1016,31 @@ can be created from an <code>RDD</code> of sub-matrix blocks, where a sub-matrix
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.linalg.distributed.BlockMatrix"><code>BlockMatrix</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Matrices</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Matrices</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.linalg.distributed</span> <span class="kn">import</span> <span class="n">BlockMatrix</span>
 
-<span class="c"># Create an RDD of sub-matrix blocks.</span>
+<span class="c1"># Create an RDD of sub-matrix blocks.</span>
 <span class="n">blocks</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([((</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="n">Matrices</span><span class="o">.</span><span class="n">dense</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">])),</span>
                          <span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="n">Matrices</span><span class="o">.</span><span class="n">dense</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="p">[</span><span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">11</span><span class="p">,</span> <span class="mi">12</span><span class="p">]))])</span>
 
-<span class="c"># Create a BlockMatrix from an RDD of sub-matrix blocks.</span>
+<span class="c1"># Create a BlockMatrix from an RDD of sub-matrix blocks.</span>
 <span class="n">mat</span> <span class="o">=</span> <span class="n">BlockMatrix</span><span class="p">(</span><span class="n">blocks</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
 
-<span class="c"># Get its size.</span>
-<span class="n">m</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numRows</span><span class="p">()</span>  <span class="c"># 6</span>
-<span class="n">n</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numCols</span><span class="p">()</span>  <span class="c"># 2</span>
+<span class="c1"># Get its size.</span>
+<span class="n">m</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numRows</span><span class="p">()</span>  <span class="c1"># 6</span>
+<span class="n">n</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">numCols</span><span class="p">()</span>  <span class="c1"># 2</span>
 
-<span class="c"># Get the blocks as an RDD of sub-matrix blocks.</span>
+<span class="c1"># Get the blocks as an RDD of sub-matrix blocks.</span>
 <span class="n">blocksRDD</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">blocks</span>
 
-<span class="c"># Convert to a LocalMatrix.</span>
+<span class="c1"># Convert to a LocalMatrix.</span>
 <span class="n">localMat</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">toLocalMatrix</span><span class="p">()</span>
 
-<span class="c"># Convert to an IndexedRowMatrix.</span>
+<span class="c1"># Convert to an IndexedRowMatrix.</span>
 <span class="n">indexedRowMat</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">toIndexedRowMatrix</span><span class="p">()</span>
 
-<span class="c"># Convert to a CoordinateMatrix.</span>
-<span class="n">coordinateMat</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">toCoordinateMatrix</span><span class="p">()</span></code></pre></div>
+<span class="c1"># Convert to a CoordinateMatrix.</span>
+<span class="n">coordinateMat</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="n">toCoordinateMatrix</span><span class="p">()</span></code></pre></figure>
 
   </div>
 </div>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[21/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/ml-clustering.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/ml-clustering.html b/site/docs/2.1.0/ml-clustering.html
index e225281..df38605 100644
--- a/site/docs/2.1.0/ml-clustering.html
+++ b/site/docs/2.1.0/ml-clustering.html
@@ -313,21 +313,21 @@ about these algorithms.</p>
 <p><strong>Table of Contents</strong></p>
 
 <ul id="markdown-toc">
-  <li><a href="#k-means" id="markdown-toc-k-means">K-means</a>    <ul>
-      <li><a href="#input-columns" id="markdown-toc-input-columns">Input Columns</a></li>
-      <li><a href="#output-columns" id="markdown-toc-output-columns">Output Columns</a></li>
-      <li><a href="#example" id="markdown-toc-example">Example</a></li>
+  <li><a href="#k-means">K-means</a>    <ul>
+      <li><a href="#input-columns">Input Columns</a></li>
+      <li><a href="#output-columns">Output Columns</a></li>
+      <li><a href="#example">Example</a></li>
     </ul>
   </li>
-  <li><a href="#latent-dirichlet-allocation-lda" id="markdown-toc-latent-dirichlet-allocation-lda">Latent Dirichlet allocation (LDA)</a></li>
-  <li><a href="#bisecting-k-means" id="markdown-toc-bisecting-k-means">Bisecting k-means</a>    <ul>
-      <li><a href="#example-1" id="markdown-toc-example-1">Example</a></li>
+  <li><a href="#latent-dirichlet-allocation-lda">Latent Dirichlet allocation (LDA)</a></li>
+  <li><a href="#bisecting-k-means">Bisecting k-means</a>    <ul>
+      <li><a href="#example-1">Example</a></li>
     </ul>
   </li>
-  <li><a href="#gaussian-mixture-model-gmm" id="markdown-toc-gaussian-mixture-model-gmm">Gaussian Mixture Model (GMM)</a>    <ul>
-      <li><a href="#input-columns-1" id="markdown-toc-input-columns-1">Input Columns</a></li>
-      <li><a href="#output-columns-1" id="markdown-toc-output-columns-1">Output Columns</a></li>
-      <li><a href="#example-2" id="markdown-toc-example-2">Example</a></li>
+  <li><a href="#gaussian-mixture-model-gmm">Gaussian Mixture Model (GMM)</a>    <ul>
+      <li><a href="#input-columns-1">Input Columns</a></li>
+      <li><a href="#output-columns-1">Output Columns</a></li>
+      <li><a href="#example-2">Example</a></li>
     </ul>
   </li>
 </ul>
@@ -391,7 +391,7 @@ called <a href="http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf">kmea
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.ml.clustering.KMeans">Scala API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.clustering.KMeans</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.clustering.KMeans</span>
 
 <span class="c1">// Loads data.</span>
 <span class="k">val</span> <span class="n">dataset</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="o">(</span><span class="s">&quot;libsvm&quot;</span><span class="o">).</span><span class="n">load</span><span class="o">(</span><span class="s">&quot;data/mllib/sample_kmeans_data.txt&quot;</span><span class="o">)</span>
@@ -402,7 +402,7 @@ called <a href="http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf">kmea
 
 <span class="c1">// Evaluate clustering by computing Within Set Sum of Squared Errors.</span>
 <span class="k">val</span> <span class="nc">WSSSE</span> <span class="k">=</span> <span class="n">model</span><span class="o">.</span><span class="n">computeCost</span><span class="o">(</span><span class="n">dataset</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Within Set Sum of Squared Errors = $WSSSE&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Within Set Sum of Squared Errors = </span><span class="si">$WSSSE</span><span class="s">&quot;</span><span class="o">)</span>
 
 <span class="c1">// Shows the result.</span>
 <span class="n">println</span><span class="o">(</span><span class="s">&quot;Cluster Centers: &quot;</span><span class="o">)</span>
@@ -414,7 +414,7 @@ called <a href="http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf">kmea
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/ml/clustering/KMeans.html">Java API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.ml.clustering.KMeansModel</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.ml.clustering.KMeansModel</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.clustering.KMeans</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.linalg.Vector</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
@@ -424,7 +424,7 @@ called <a href="http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf">kmea
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">dataset</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">read</span><span class="o">().</span><span class="na">format</span><span class="o">(</span><span class="s">&quot;libsvm&quot;</span><span class="o">).</span><span class="na">load</span><span class="o">(</span><span class="s">&quot;data/mllib/sample_kmeans_data.txt&quot;</span><span class="o">);</span>
 
 <span class="c1">// Trains a k-means model.</span>
-<span class="n">KMeans</span> <span class="n">kmeans</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">KMeans</span><span class="o">().</span><span class="na">setK</span><span class="o">(</span><span class="mi">2</span><span class="o">).</span><span class="na">setSeed</span><span class="o">(</span><span class="mi">1L</span><span class="o">);</span>
+<span class="n">KMeans</span> <span class="n">kmeans</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KMeans</span><span class="o">().</span><span class="na">setK</span><span class="o">(</span><span class="mi">2</span><span class="o">).</span><span class="na">setSeed</span><span class="o">(</span><span class="mi">1L</span><span class="o">);</span>
 <span class="n">KMeansModel</span> <span class="n">model</span> <span class="o">=</span> <span class="n">kmeans</span><span class="o">.</span><span class="na">fit</span><span class="o">(</span><span class="n">dataset</span><span class="o">);</span>
 
 <span class="c1">// Evaluate clustering by computing Within Set Sum of Squared Errors.</span>
@@ -434,7 +434,7 @@ called <a href="http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf">kmea
 <span class="c1">// Shows the result.</span>
 <span class="n">Vector</span><span class="o">[]</span> <span class="n">centers</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="na">clusterCenters</span><span class="o">();</span>
 <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;Cluster Centers: &quot;</span><span class="o">);</span>
-<span class="k">for</span> <span class="o">(</span><span class="n">Vector</span> <span class="nl">center:</span> <span class="n">centers</span><span class="o">)</span> <span class="o">{</span>
+<span class="k">for</span> <span class="o">(</span><span class="n">Vector</span> <span class="n">center</span><span class="o">:</span> <span class="n">centers</span><span class="o">)</span> <span class="o">{</span>
   <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">center</span><span class="o">);</span>
 <span class="o">}</span>
 </pre></div>
@@ -444,22 +444,22 @@ called <a href="http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf">kmea
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.ml.html#pyspark.ml.clustering.KMeans">Python API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.clustering</span> <span class="kn">import</span> <span class="n">KMeans</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.clustering</span> <span class="kn">import</span> <span class="n">KMeans</span>
 
-<span class="c"># Loads data.</span>
-<span class="n">dataset</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_kmeans_data.txt&quot;</span><span class="p">)</span>
+<span class="c1"># Loads data.</span>
+<span class="n">dataset</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_kmeans_data.txt&quot;</span><span class="p">)</span>
 
-<span class="c"># Trains a k-means model.</span>
+<span class="c1"># Trains a k-means model.</span>
 <span class="n">kmeans</span> <span class="o">=</span> <span class="n">KMeans</span><span class="p">()</span><span class="o">.</span><span class="n">setK</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">setSeed</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">kmeans</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="p">)</span>
 
-<span class="c"># Evaluate clustering by computing Within Set Sum of Squared Errors.</span>
+<span class="c1"># Evaluate clustering by computing Within Set Sum of Squared Errors.</span>
 <span class="n">wssse</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">computeCost</span><span class="p">(</span><span class="n">dataset</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Within Set Sum of Squared Errors = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">wssse</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Within Set Sum of Squared Errors = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">wssse</span><span class="p">))</span>
 
-<span class="c"># Shows the result.</span>
+<span class="c1"># Shows the result.</span>
 <span class="n">centers</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">clusterCenters</span><span class="p">()</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Cluster Centers: &quot;</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Cluster Centers: &quot;</span><span class="p">)</span>
 <span class="k">for</span> <span class="n">center</span> <span class="ow">in</span> <span class="n">centers</span><span class="p">:</span>
     <span class="k">print</span><span class="p">(</span><span class="n">center</span><span class="p">)</span>
 </pre></div>
@@ -470,7 +470,7 @@ called <a href="http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf">kmea
 
     <p>Refer to the <a href="api/R/spark.kmeans.html">R API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="c1"># Fit a k-means model with spark.kmeans</span>
+    <div class="highlight"><pre><span></span><span class="c1"># Fit a k-means model with spark.kmeans</span>
 irisDF <span class="o">&lt;-</span> <span class="kp">suppressWarnings</span><span class="p">(</span>createDataFrame<span class="p">(</span>iris<span class="p">))</span>
 kmeansDF <span class="o">&lt;-</span> irisDF
 kmeansTestDF <span class="o">&lt;-</span> irisDF
@@ -504,7 +504,7 @@ and generates a <code>LDAModel</code> as the base model. Expert users may cast a
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.ml.clustering.LDA">Scala API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.clustering.LDA</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.clustering.LDA</span>
 
 <span class="c1">// Loads data.</span>
 <span class="k">val</span> <span class="n">dataset</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="o">(</span><span class="s">&quot;libsvm&quot;</span><span class="o">)</span>
@@ -516,8 +516,8 @@ and generates a <code>LDAModel</code> as the base model. Expert users may cast a
 
 <span class="k">val</span> <span class="n">ll</span> <span class="k">=</span> <span class="n">model</span><span class="o">.</span><span class="n">logLikelihood</span><span class="o">(</span><span class="n">dataset</span><span class="o">)</span>
 <span class="k">val</span> <span class="n">lp</span> <span class="k">=</span> <span class="n">model</span><span class="o">.</span><span class="n">logPerplexity</span><span class="o">(</span><span class="n">dataset</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;The lower bound on the log likelihood of the entire corpus: $ll&quot;</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;The upper bound bound on perplexity: $lp&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;The lower bound on the log likelihood of the entire corpus: </span><span class="si">$ll</span><span class="s">&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;The upper bound bound on perplexity: </span><span class="si">$lp</span><span class="s">&quot;</span><span class="o">)</span>
 
 <span class="c1">// Describe topics.</span>
 <span class="k">val</span> <span class="n">topics</span> <span class="k">=</span> <span class="n">model</span><span class="o">.</span><span class="n">describeTopics</span><span class="o">(</span><span class="mi">3</span><span class="o">)</span>
@@ -535,7 +535,7 @@ and generates a <code>LDAModel</code> as the base model. Expert users may cast a
 
     <p>Refer to the <a href="api/java/org/apache/spark/ml/clustering/LDA.html">Java API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.ml.clustering.LDA</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.ml.clustering.LDA</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.clustering.LDAModel</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.Row</span><span class="o">;</span>
@@ -546,7 +546,7 @@ and generates a <code>LDAModel</code> as the base model. Expert users may cast a
   <span class="o">.</span><span class="na">load</span><span class="o">(</span><span class="s">&quot;data/mllib/sample_lda_libsvm_data.txt&quot;</span><span class="o">);</span>
 
 <span class="c1">// Trains a LDA model.</span>
-<span class="n">LDA</span> <span class="n">lda</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LDA</span><span class="o">().</span><span class="na">setK</span><span class="o">(</span><span class="mi">10</span><span class="o">).</span><span class="na">setMaxIter</span><span class="o">(</span><span class="mi">10</span><span class="o">);</span>
+<span class="n">LDA</span> <span class="n">lda</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LDA</span><span class="o">().</span><span class="na">setK</span><span class="o">(</span><span class="mi">10</span><span class="o">).</span><span class="na">setMaxIter</span><span class="o">(</span><span class="mi">10</span><span class="o">);</span>
 <span class="n">LDAModel</span> <span class="n">model</span> <span class="o">=</span> <span class="n">lda</span><span class="o">.</span><span class="na">fit</span><span class="o">(</span><span class="n">dataset</span><span class="o">);</span>
 
 <span class="kt">double</span> <span class="n">ll</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="na">logLikelihood</span><span class="o">(</span><span class="n">dataset</span><span class="o">);</span>
@@ -570,26 +570,26 @@ and generates a <code>LDAModel</code> as the base model. Expert users may cast a
 
     <p>Refer to the <a href="api/python/pyspark.ml.html#pyspark.ml.clustering.LDA">Python API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.clustering</span> <span class="kn">import</span> <span class="n">LDA</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.clustering</span> <span class="kn">import</span> <span class="n">LDA</span>
 
-<span class="c"># Loads data.</span>
-<span class="n">dataset</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_lda_libsvm_data.txt&quot;</span><span class="p">)</span>
+<span class="c1"># Loads data.</span>
+<span class="n">dataset</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_lda_libsvm_data.txt&quot;</span><span class="p">)</span>
 
-<span class="c"># Trains a LDA model.</span>
+<span class="c1"># Trains a LDA model.</span>
 <span class="n">lda</span> <span class="o">=</span> <span class="n">LDA</span><span class="p">(</span><span class="n">k</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">maxIter</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">lda</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="p">)</span>
 
 <span class="n">ll</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">logLikelihood</span><span class="p">(</span><span class="n">dataset</span><span class="p">)</span>
 <span class="n">lp</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">logPerplexity</span><span class="p">(</span><span class="n">dataset</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;The lower bound on the log likelihood of the entire corpus: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">ll</span><span class="p">))</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;The upper bound bound on perplexity: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">lp</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;The lower bound on the log likelihood of the entire corpus: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">ll</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;The upper bound bound on perplexity: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">lp</span><span class="p">))</span>
 
-<span class="c"># Describe topics.</span>
+<span class="c1"># Describe topics.</span>
 <span class="n">topics</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">describeTopics</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;The topics described by their top-weighted terms:&quot;</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;The topics described by their top-weighted terms:&quot;</span><span class="p">)</span>
 <span class="n">topics</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">truncate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
 
-<span class="c"># Shows the result</span>
+<span class="c1"># Shows the result</span>
 <span class="n">transformed</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">dataset</span><span class="p">)</span>
 <span class="n">transformed</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">truncate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
 </pre></div>
@@ -600,7 +600,7 @@ and generates a <code>LDAModel</code> as the base model. Expert users may cast a
 
     <p>Refer to the <a href="api/R/spark.lda.html">R API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="c1"># Load training data</span>
+    <div class="highlight"><pre><span></span><span class="c1"># Load training data</span>
 df <span class="o">&lt;-</span> read.df<span class="p">(</span><span class="s">&quot;data/mllib/sample_lda_libsvm_data.txt&quot;</span><span class="p">,</span> <span class="kn">source</span> <span class="o">=</span> <span class="s">&quot;libsvm&quot;</span><span class="p">)</span>
 training <span class="o">&lt;-</span> df
 test <span class="o">&lt;-</span> df
@@ -641,7 +641,7 @@ moves down the hierarchy.</p>
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.ml.clustering.BisectingKMeans">Scala API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.clustering.BisectingKMeans</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.clustering.BisectingKMeans</span>
 
 <span class="c1">// Loads data.</span>
 <span class="k">val</span> <span class="n">dataset</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="o">(</span><span class="s">&quot;libsvm&quot;</span><span class="o">).</span><span class="n">load</span><span class="o">(</span><span class="s">&quot;data/mllib/sample_kmeans_data.txt&quot;</span><span class="o">)</span>
@@ -652,7 +652,7 @@ moves down the hierarchy.</p>
 
 <span class="c1">// Evaluate clustering.</span>
 <span class="k">val</span> <span class="n">cost</span> <span class="k">=</span> <span class="n">model</span><span class="o">.</span><span class="n">computeCost</span><span class="o">(</span><span class="n">dataset</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Within Set Sum of Squared Errors = $cost&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Within Set Sum of Squared Errors = </span><span class="si">$cost</span><span class="s">&quot;</span><span class="o">)</span>
 
 <span class="c1">// Shows the result.</span>
 <span class="n">println</span><span class="o">(</span><span class="s">&quot;Cluster Centers: &quot;</span><span class="o">)</span>
@@ -665,7 +665,7 @@ moves down the hierarchy.</p>
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/ml/clustering/BisectingKMeans.html">Java API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.ml.clustering.BisectingKMeans</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.ml.clustering.BisectingKMeans</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.clustering.BisectingKMeansModel</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.linalg.Vector</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
@@ -675,7 +675,7 @@ moves down the hierarchy.</p>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">dataset</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">read</span><span class="o">().</span><span class="na">format</span><span class="o">(</span><span class="s">&quot;libsvm&quot;</span><span class="o">).</span><span class="na">load</span><span class="o">(</span><span class="s">&quot;data/mllib/sample_kmeans_data.txt&quot;</span><span class="o">);</span>
 
 <span class="c1">// Trains a bisecting k-means model.</span>
-<span class="n">BisectingKMeans</span> <span class="n">bkm</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">BisectingKMeans</span><span class="o">().</span><span class="na">setK</span><span class="o">(</span><span class="mi">2</span><span class="o">).</span><span class="na">setSeed</span><span class="o">(</span><span class="mi">1</span><span class="o">);</span>
+<span class="n">BisectingKMeans</span> <span class="n">bkm</span> <span class="o">=</span> <span class="k">new</span> <span class="n">BisectingKMeans</span><span class="o">().</span><span class="na">setK</span><span class="o">(</span><span class="mi">2</span><span class="o">).</span><span class="na">setSeed</span><span class="o">(</span><span class="mi">1</span><span class="o">);</span>
 <span class="n">BisectingKMeansModel</span> <span class="n">model</span> <span class="o">=</span> <span class="n">bkm</span><span class="o">.</span><span class="na">fit</span><span class="o">(</span><span class="n">dataset</span><span class="o">);</span>
 
 <span class="c1">// Evaluate clustering.</span>
@@ -695,21 +695,21 @@ moves down the hierarchy.</p>
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.ml.html#pyspark.ml.clustering.BisectingKMeans">Python API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.clustering</span> <span class="kn">import</span> <span class="n">BisectingKMeans</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.clustering</span> <span class="kn">import</span> <span class="n">BisectingKMeans</span>
 
-<span class="c"># Loads data.</span>
-<span class="n">dataset</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_kmeans_data.txt&quot;</span><span class="p">)</span>
+<span class="c1"># Loads data.</span>
+<span class="n">dataset</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_kmeans_data.txt&quot;</span><span class="p">)</span>
 
-<span class="c"># Trains a bisecting k-means model.</span>
+<span class="c1"># Trains a bisecting k-means model.</span>
 <span class="n">bkm</span> <span class="o">=</span> <span class="n">BisectingKMeans</span><span class="p">()</span><span class="o">.</span><span class="n">setK</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">setSeed</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">bkm</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="p">)</span>
 
-<span class="c"># Evaluate clustering.</span>
+<span class="c1"># Evaluate clustering.</span>
 <span class="n">cost</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">computeCost</span><span class="p">(</span><span class="n">dataset</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Within Set Sum of Squared Errors = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">cost</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Within Set Sum of Squared Errors = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">cost</span><span class="p">))</span>
 
-<span class="c"># Shows the result.</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Cluster Centers: &quot;</span><span class="p">)</span>
+<span class="c1"># Shows the result.</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Cluster Centers: &quot;</span><span class="p">)</span>
 <span class="n">centers</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">clusterCenters</span><span class="p">()</span>
 <span class="k">for</span> <span class="n">center</span> <span class="ow">in</span> <span class="n">centers</span><span class="p">:</span>
     <span class="k">print</span><span class="p">(</span><span class="n">center</span><span class="p">)</span>
@@ -784,7 +784,7 @@ model.</p>
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.ml.clustering.GaussianMixture">Scala API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.clustering.GaussianMixture</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.clustering.GaussianMixture</span>
 
 <span class="c1">// Loads data</span>
 <span class="k">val</span> <span class="n">dataset</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="o">(</span><span class="s">&quot;libsvm&quot;</span><span class="o">).</span><span class="n">load</span><span class="o">(</span><span class="s">&quot;data/mllib/sample_kmeans_data.txt&quot;</span><span class="o">)</span>
@@ -796,8 +796,8 @@ model.</p>
 
 <span class="c1">// output parameters of mixture model model</span>
 <span class="k">for</span> <span class="o">(</span><span class="n">i</span> <span class="k">&lt;-</span> <span class="mi">0</span> <span class="n">until</span> <span class="n">model</span><span class="o">.</span><span class="n">getK</span><span class="o">)</span> <span class="o">{</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Gaussian $i:\nweight=${model.weights(i)}\n&quot;</span> <span class="o">+</span>
-      <span class="n">s</span><span class="s">&quot;mu=${model.gaussians(i).mean}\nsigma=\n${model.gaussians(i).cov}\n&quot;</span><span class="o">)</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;Gaussian </span><span class="si">$i</span><span class="s">:\nweight=</span><span class="si">${</span><span class="n">model</span><span class="o">.</span><span class="n">weights</span><span class="o">(</span><span class="n">i</span><span class="o">)</span><span class="si">}</span><span class="s">\n&quot;</span> <span class="o">+</span>
+      <span class="s">s&quot;mu=</span><span class="si">${</span><span class="n">model</span><span class="o">.</span><span class="n">gaussians</span><span class="o">(</span><span class="n">i</span><span class="o">).</span><span class="n">mean</span><span class="si">}</span><span class="s">\nsigma=\n</span><span class="si">${</span><span class="n">model</span><span class="o">.</span><span class="n">gaussians</span><span class="o">(</span><span class="n">i</span><span class="o">).</span><span class="n">cov</span><span class="si">}</span><span class="s">\n&quot;</span><span class="o">)</span>
 <span class="o">}</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/ml/GaussianMixtureExample.scala" in the Spark repo.</small></div>
@@ -806,7 +806,7 @@ model.</p>
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/ml/clustering/GaussianMixture.html">Java API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.ml.clustering.GaussianMixture</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.ml.clustering.GaussianMixture</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.clustering.GaussianMixtureModel</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.Row</span><span class="o">;</span>
@@ -815,7 +815,7 @@ model.</p>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">dataset</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">read</span><span class="o">().</span><span class="na">format</span><span class="o">(</span><span class="s">&quot;libsvm&quot;</span><span class="o">).</span><span class="na">load</span><span class="o">(</span><span class="s">&quot;data/mllib/sample_kmeans_data.txt&quot;</span><span class="o">);</span>
 
 <span class="c1">// Trains a GaussianMixture model</span>
-<span class="n">GaussianMixture</span> <span class="n">gmm</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">GaussianMixture</span><span class="o">()</span>
+<span class="n">GaussianMixture</span> <span class="n">gmm</span> <span class="o">=</span> <span class="k">new</span> <span class="n">GaussianMixture</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setK</span><span class="o">(</span><span class="mi">2</span><span class="o">);</span>
 <span class="n">GaussianMixtureModel</span> <span class="n">model</span> <span class="o">=</span> <span class="n">gmm</span><span class="o">.</span><span class="na">fit</span><span class="o">(</span><span class="n">dataset</span><span class="o">);</span>
 
@@ -831,15 +831,15 @@ model.</p>
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.ml.html#pyspark.ml.clustering.GaussianMixture">Python API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.clustering</span> <span class="kn">import</span> <span class="n">GaussianMixture</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.clustering</span> <span class="kn">import</span> <span class="n">GaussianMixture</span>
 
-<span class="c"># loads data</span>
-<span class="n">dataset</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_kmeans_data.txt&quot;</span><span class="p">)</span>
+<span class="c1"># loads data</span>
+<span class="n">dataset</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_kmeans_data.txt&quot;</span><span class="p">)</span>
 
 <span class="n">gmm</span> <span class="o">=</span> <span class="n">GaussianMixture</span><span class="p">()</span><span class="o">.</span><span class="n">setK</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">setSeed</span><span class="p">(</span><span class="mi">538009335</span><span class="p">)</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">gmm</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="p">)</span>
 
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Gaussians shown as a DataFrame: &quot;</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Gaussians shown as a DataFrame: &quot;</span><span class="p">)</span>
 <span class="n">model</span><span class="o">.</span><span class="n">gaussiansDF</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">truncate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/ml/gaussian_mixture_example.py" in the Spark repo.</small></div>
@@ -849,7 +849,7 @@ model.</p>
 
     <p>Refer to the <a href="api/R/spark.gaussianMixture.html">R API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="c1"># Load training data</span>
+    <div class="highlight"><pre><span></span><span class="c1"># Load training data</span>
 df <span class="o">&lt;-</span> read.df<span class="p">(</span><span class="s">&quot;data/mllib/sample_kmeans_data.txt&quot;</span><span class="p">,</span> <span class="kn">source</span> <span class="o">=</span> <span class="s">&quot;libsvm&quot;</span><span class="p">)</span>
 training <span class="o">&lt;-</span> df
 test <span class="o">&lt;-</span> df

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/ml-collaborative-filtering.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/ml-collaborative-filtering.html b/site/docs/2.1.0/ml-collaborative-filtering.html
index 1f63418..91e5bed 100644
--- a/site/docs/2.1.0/ml-collaborative-filtering.html
+++ b/site/docs/2.1.0/ml-collaborative-filtering.html
@@ -307,12 +307,12 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#collaborative-filtering" id="markdown-toc-collaborative-filtering">Collaborative filtering</a>    <ul>
-      <li><a href="#explicit-vs-implicit-feedback" id="markdown-toc-explicit-vs-implicit-feedback">Explicit vs. implicit feedback</a></li>
-      <li><a href="#scaling-of-the-regularization-parameter" id="markdown-toc-scaling-of-the-regularization-parameter">Scaling of the regularization parameter</a></li>
+  <li><a href="#collaborative-filtering">Collaborative filtering</a>    <ul>
+      <li><a href="#explicit-vs-implicit-feedback">Explicit vs. implicit feedback</a></li>
+      <li><a href="#scaling-of-the-regularization-parameter">Scaling of the regularization parameter</a></li>
     </ul>
   </li>
-  <li><a href="#examples" id="markdown-toc-examples">Examples</a></li>
+  <li><a href="#examples">Examples</a></li>
 </ul>
 
 <h2 id="collaborative-filtering">Collaborative filtering</h2>
@@ -341,7 +341,7 @@ following parameters:</p>
 
 <p><strong>Note:</strong> The DataFrame-based API for ALS currently only supports integers for user and item ids.
 Other numeric types are supported for the user and item id columns, 
-but the ids must be within the integer value range.</p>
+but the ids must be within the integer value range. </p>
 
 <h3 id="explicit-vs-implicit-feedback">Explicit vs. implicit feedback</h3>
 
@@ -385,7 +385,7 @@ rating prediction.</p>
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.ml.recommendation.ALS"><code>ALS</code> Scala docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.evaluation.RegressionEvaluator</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.evaluation.RegressionEvaluator</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.recommendation.ALS</span>
 
 <span class="k">case</span> <span class="k">class</span> <span class="nc">Rating</span><span class="o">(</span><span class="n">userId</span><span class="k">:</span> <span class="kt">Int</span><span class="o">,</span> <span class="n">movieId</span><span class="k">:</span> <span class="kt">Int</span><span class="o">,</span> <span class="n">rating</span><span class="k">:</span> <span class="kt">Float</span><span class="o">,</span> <span class="n">timestamp</span><span class="k">:</span> <span class="kt">Long</span><span class="o">)</span>
@@ -417,7 +417,7 @@ for more details on the API.</p>
   <span class="o">.</span><span class="n">setLabelCol</span><span class="o">(</span><span class="s">&quot;rating&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="n">setPredictionCol</span><span class="o">(</span><span class="s">&quot;prediction&quot;</span><span class="o">)</span>
 <span class="k">val</span> <span class="n">rmse</span> <span class="k">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">evaluate</span><span class="o">(</span><span class="n">predictions</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Root-mean-square error = $rmse&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Root-mean-square error = </span><span class="si">$rmse</span><span class="s">&quot;</span><span class="o">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/ml/ALSExample.scala" in the Spark repo.</small></div>
 
@@ -425,13 +425,13 @@ for more details on the API.</p>
 inferred from other signals), you can set <code>implicitPrefs</code> to <code>true</code> to get
 better results:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">als</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">ALS</span><span class="o">()</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">als</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">ALS</span><span class="o">()</span>
   <span class="o">.</span><span class="n">setMaxIter</span><span class="o">(</span><span class="mi">5</span><span class="o">)</span>
   <span class="o">.</span><span class="n">setRegParam</span><span class="o">(</span><span class="mf">0.01</span><span class="o">)</span>
   <span class="o">.</span><span class="n">setImplicitPrefs</span><span class="o">(</span><span class="kc">true</span><span class="o">)</span>
   <span class="o">.</span><span class="n">setUserCol</span><span class="o">(</span><span class="s">&quot;userId&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="n">setItemCol</span><span class="o">(</span><span class="s">&quot;movieId&quot;</span><span class="o">)</span>
-  <span class="o">.</span><span class="n">setRatingCol</span><span class="o">(</span><span class="s">&quot;rating&quot;</span><span class="o">)</span></code></pre></div>
+  <span class="o">.</span><span class="n">setRatingCol</span><span class="o">(</span><span class="s">&quot;rating&quot;</span><span class="o">)</span></code></pre></figure>
 
   </div>
 
@@ -448,7 +448,7 @@ rating prediction.</p>
     <p>Refer to the <a href="api/java/org/apache/spark/ml/recommendation/ALS.html"><code>ALS</code> Java docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.io.Serializable</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.io.Serializable</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
@@ -490,13 +490,13 @@ for more details on the API.</p>
   <span class="kd">public</span> <span class="kd">static</span> <span class="n">Rating</span> <span class="nf">parseRating</span><span class="o">(</span><span class="n">String</span> <span class="n">str</span><span class="o">)</span> <span class="o">{</span>
     <span class="n">String</span><span class="o">[]</span> <span class="n">fields</span> <span class="o">=</span> <span class="n">str</span><span class="o">.</span><span class="na">split</span><span class="o">(</span><span class="s">&quot;::&quot;</span><span class="o">);</span>
     <span class="k">if</span> <span class="o">(</span><span class="n">fields</span><span class="o">.</span><span class="na">length</span> <span class="o">!=</span> <span class="mi">4</span><span class="o">)</span> <span class="o">{</span>
-      <span class="k">throw</span> <span class="k">new</span> <span class="nf">IllegalArgumentException</span><span class="o">(</span><span class="s">&quot;Each line must contain 4 fields&quot;</span><span class="o">);</span>
+      <span class="k">throw</span> <span class="k">new</span> <span class="n">IllegalArgumentException</span><span class="o">(</span><span class="s">&quot;Each line must contain 4 fields&quot;</span><span class="o">);</span>
     <span class="o">}</span>
     <span class="kt">int</span> <span class="n">userId</span> <span class="o">=</span> <span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">fields</span><span class="o">[</span><span class="mi">0</span><span class="o">]);</span>
     <span class="kt">int</span> <span class="n">movieId</span> <span class="o">=</span> <span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">fields</span><span class="o">[</span><span class="mi">1</span><span class="o">]);</span>
     <span class="kt">float</span> <span class="n">rating</span> <span class="o">=</span> <span class="n">Float</span><span class="o">.</span><span class="na">parseFloat</span><span class="o">(</span><span class="n">fields</span><span class="o">[</span><span class="mi">2</span><span class="o">]);</span>
     <span class="kt">long</span> <span class="n">timestamp</span> <span class="o">=</span> <span class="n">Long</span><span class="o">.</span><span class="na">parseLong</span><span class="o">(</span><span class="n">fields</span><span class="o">[</span><span class="mi">3</span><span class="o">]);</span>
-    <span class="k">return</span> <span class="k">new</span> <span class="nf">Rating</span><span class="o">(</span><span class="n">userId</span><span class="o">,</span> <span class="n">movieId</span><span class="o">,</span> <span class="n">rating</span><span class="o">,</span> <span class="n">timestamp</span><span class="o">);</span>
+    <span class="k">return</span> <span class="k">new</span> <span class="n">Rating</span><span class="o">(</span><span class="n">userId</span><span class="o">,</span> <span class="n">movieId</span><span class="o">,</span> <span class="n">rating</span><span class="o">,</span> <span class="n">timestamp</span><span class="o">);</span>
   <span class="o">}</span>
 <span class="o">}</span>
 
@@ -513,7 +513,7 @@ for more details on the API.</p>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">test</span> <span class="o">=</span> <span class="n">splits</span><span class="o">[</span><span class="mi">1</span><span class="o">];</span>
 
 <span class="c1">// Build the recommendation model using ALS on the training data</span>
-<span class="n">ALS</span> <span class="n">als</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">ALS</span><span class="o">()</span>
+<span class="n">ALS</span> <span class="n">als</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ALS</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setMaxIter</span><span class="o">(</span><span class="mi">5</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setRegParam</span><span class="o">(</span><span class="mf">0.01</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setUserCol</span><span class="o">(</span><span class="s">&quot;userId&quot;</span><span class="o">)</span>
@@ -524,7 +524,7 @@ for more details on the API.</p>
 <span class="c1">// Evaluate the model by computing the RMSE on the test data</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="na">transform</span><span class="o">(</span><span class="n">test</span><span class="o">);</span>
 
-<span class="n">RegressionEvaluator</span> <span class="n">evaluator</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">RegressionEvaluator</span><span class="o">()</span>
+<span class="n">RegressionEvaluator</span> <span class="n">evaluator</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RegressionEvaluator</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setMetricName</span><span class="o">(</span><span class="s">&quot;rmse&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setLabelCol</span><span class="o">(</span><span class="s">&quot;rating&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setPredictionCol</span><span class="o">(</span><span class="s">&quot;prediction&quot;</span><span class="o">);</span>
@@ -537,13 +537,13 @@ for more details on the API.</p>
 inferred from other signals), you can set <code>implicitPrefs</code> to <code>true</code> to get
 better results:</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">ALS</span> <span class="n">als</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">ALS</span><span class="o">()</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">ALS</span> <span class="n">als</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ALS</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setMaxIter</span><span class="o">(</span><span class="mi">5</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setRegParam</span><span class="o">(</span><span class="mf">0.01</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setImplicitPrefs</span><span class="o">(</span><span class="kc">true</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setUserCol</span><span class="o">(</span><span class="s">&quot;userId&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setItemCol</span><span class="o">(</span><span class="s">&quot;movieId&quot;</span><span class="o">)</span>
-  <span class="o">.</span><span class="na">setRatingCol</span><span class="o">(</span><span class="s">&quot;rating&quot;</span><span class="o">);</span></code></pre></div>
+  <span class="o">.</span><span class="na">setRatingCol</span><span class="o">(</span><span class="s">&quot;rating&quot;</span><span class="o">);</span></code></pre></figure>
 
   </div>
 
@@ -560,27 +560,27 @@ rating prediction.</p>
     <p>Refer to the <a href="api/python/pyspark.ml.html#pyspark.ml.recommendation.ALS"><code>ALS</code> Python docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.evaluation</span> <span class="kn">import</span> <span class="n">RegressionEvaluator</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.evaluation</span> <span class="kn">import</span> <span class="n">RegressionEvaluator</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.recommendation</span> <span class="kn">import</span> <span class="n">ALS</span>
 <span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">Row</span>
 
-<span class="n">lines</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">text</span><span class="p">(</span><span class="s">&quot;data/mllib/als/sample_movielens_ratings.txt&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">rdd</span>
-<span class="n">parts</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">row</span><span class="p">:</span> <span class="n">row</span><span class="o">.</span><span class="n">value</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot;::&quot;</span><span class="p">))</span>
+<span class="n">lines</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">text</span><span class="p">(</span><span class="s2">&quot;data/mllib/als/sample_movielens_ratings.txt&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">rdd</span>
+<span class="n">parts</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">row</span><span class="p">:</span> <span class="n">row</span><span class="o">.</span><span class="n">value</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot;::&quot;</span><span class="p">))</span>
 <span class="n">ratingsRDD</span> <span class="o">=</span> <span class="n">parts</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="n">Row</span><span class="p">(</span><span class="n">userId</span><span class="o">=</span><span class="nb">int</span><span class="p">(</span><span class="n">p</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="n">movieId</span><span class="o">=</span><span class="nb">int</span><span class="p">(</span><span class="n">p</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span>
                                      <span class="n">rating</span><span class="o">=</span><span class="nb">float</span><span class="p">(</span><span class="n">p</span><span class="p">[</span><span class="mi">2</span><span class="p">]),</span> <span class="n">timestamp</span><span class="o">=</span><span class="nb">long</span><span class="p">(</span><span class="n">p</span><span class="p">[</span><span class="mi">3</span><span class="p">])))</span>
 <span class="n">ratings</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">ratingsRDD</span><span class="p">)</span>
 <span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="n">test</span><span class="p">)</span> <span class="o">=</span> <span class="n">ratings</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.8</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">])</span>
 
-<span class="c"># Build the recommendation model using ALS on the training data</span>
-<span class="n">als</span> <span class="o">=</span> <span class="n">ALS</span><span class="p">(</span><span class="n">maxIter</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">regParam</span><span class="o">=</span><span class="mf">0.01</span><span class="p">,</span> <span class="n">userCol</span><span class="o">=</span><span class="s">&quot;userId&quot;</span><span class="p">,</span> <span class="n">itemCol</span><span class="o">=</span><span class="s">&quot;movieId&quot;</span><span class="p">,</span> <span class="n">ratingCol</span><span class="o">=</span><span class="s">&quot;rating&quot;</span><span class="p">)</span>
+<span class="c1"># Build the recommendation model using ALS on the training data</span>
+<span class="n">als</span> <span class="o">=</span> <span class="n">ALS</span><span class="p">(</span><span class="n">maxIter</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">regParam</span><span class="o">=</span><span class="mf">0.01</span><span class="p">,</span> <span class="n">userCol</span><span class="o">=</span><span class="s2">&quot;userId&quot;</span><span class="p">,</span> <span class="n">itemCol</span><span class="o">=</span><span class="s2">&quot;movieId&quot;</span><span class="p">,</span> <span class="n">ratingCol</span><span class="o">=</span><span class="s2">&quot;rating&quot;</span><span class="p">)</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">als</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
 
-<span class="c"># Evaluate the model by computing the RMSE on the test data</span>
+<span class="c1"># Evaluate the model by computing the RMSE on the test data</span>
 <span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>
-<span class="n">evaluator</span> <span class="o">=</span> <span class="n">RegressionEvaluator</span><span class="p">(</span><span class="n">metricName</span><span class="o">=</span><span class="s">&quot;rmse&quot;</span><span class="p">,</span> <span class="n">labelCol</span><span class="o">=</span><span class="s">&quot;rating&quot;</span><span class="p">,</span>
-                                <span class="n">predictionCol</span><span class="o">=</span><span class="s">&quot;prediction&quot;</span><span class="p">)</span>
+<span class="n">evaluator</span> <span class="o">=</span> <span class="n">RegressionEvaluator</span><span class="p">(</span><span class="n">metricName</span><span class="o">=</span><span class="s2">&quot;rmse&quot;</span><span class="p">,</span> <span class="n">labelCol</span><span class="o">=</span><span class="s2">&quot;rating&quot;</span><span class="p">,</span>
+                                <span class="n">predictionCol</span><span class="o">=</span><span class="s2">&quot;prediction&quot;</span><span class="p">)</span>
 <span class="n">rmse</span> <span class="o">=</span> <span class="n">evaluator</span><span class="o">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">predictions</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Root-mean-square error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">rmse</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Root-mean-square error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">rmse</span><span class="p">))</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/ml/als_example.py" in the Spark repo.</small></div>
 
@@ -588,8 +588,8 @@ for more details on the API.</p>
 inferred from other signals), you can set <code>implicitPrefs</code> to <code>True</code> to get
 better results:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">als</span> <span class="o">=</span> <span class="n">ALS</span><span class="p">(</span><span class="n">maxIter</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">regParam</span><span class="o">=</span><span class="mf">0.01</span><span class="p">,</span> <span class="n">implicitPrefs</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
-          <span class="n">userCol</span><span class="o">=</span><span class="s">&quot;userId&quot;</span><span class="p">,</span> <span class="n">itemCol</span><span class="o">=</span><span class="s">&quot;movieId&quot;</span><span class="p">,</span> <span class="n">ratingCol</span><span class="o">=</span><span class="s">&quot;rating&quot;</span><span class="p">)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">als</span> <span class="o">=</span> <span class="n">ALS</span><span class="p">(</span><span class="n">maxIter</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">regParam</span><span class="o">=</span><span class="mf">0.01</span><span class="p">,</span> <span class="n">implicitPrefs</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
+          <span class="n">userCol</span><span class="o">=</span><span class="s2">&quot;userId&quot;</span><span class="p">,</span> <span class="n">itemCol</span><span class="o">=</span><span class="s2">&quot;movieId&quot;</span><span class="p">,</span> <span class="n">ratingCol</span><span class="o">=</span><span class="s2">&quot;rating&quot;</span><span class="p">)</span></code></pre></figure>
 
   </div>
 
@@ -597,7 +597,7 @@ better results:</p>
 
     <p>Refer to the <a href="api/R/spark.als.html">R API docs</a> for more details.</p>
 
-    <div class="highlight"><pre><span class="c1"># Load training data</span>
+    <div class="highlight"><pre><span></span><span class="c1"># Load training data</span>
 data <span class="o">&lt;-</span> <span class="kt">list</span><span class="p">(</span><span class="kt">list</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">0</span><span class="p">,</span> <span class="m">4.0</span><span class="p">),</span> <span class="kt">list</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">2.0</span><span class="p">),</span> <span class="kt">list</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">3.0</span><span class="p">),</span>
              <span class="kt">list</span><span class="p">(</span><span class="m">1</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="m">4.0</span><span class="p">),</span> <span class="kt">list</span><span class="p">(</span><span class="m">2</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="m">1.0</span><span class="p">),</span> <span class="kt">list</span><span class="p">(</span><span class="m">2</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span> <span class="m">5.0</span><span class="p">))</span>
 df <span class="o">&lt;-</span> createDataFrame<span class="p">(</span>data<span class="p">,</span> <span class="kt">c</span><span class="p">(</span><span class="s">&quot;userId&quot;</span><span class="p">,</span> <span class="s">&quot;movieId&quot;</span><span class="p">,</span> <span class="s">&quot;rating&quot;</span><span class="p">))</span>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[18/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/ml-tuning.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/ml-tuning.html b/site/docs/2.1.0/ml-tuning.html
index 0c36a98..2246cc2 100644
--- a/site/docs/2.1.0/ml-tuning.html
+++ b/site/docs/2.1.0/ml-tuning.html
@@ -329,13 +329,13 @@ Built-in Cross-Validation and other tooling allow users to optimize hyperparamet
 <p><strong>Table of contents</strong></p>
 
 <ul id="markdown-toc">
-  <li><a href="#model-selection-aka-hyperparameter-tuning" id="markdown-toc-model-selection-aka-hyperparameter-tuning">Model selection (a.k.a. hyperparameter tuning)</a></li>
-  <li><a href="#cross-validation" id="markdown-toc-cross-validation">Cross-Validation</a>    <ul>
-      <li><a href="#example-model-selection-via-cross-validation" id="markdown-toc-example-model-selection-via-cross-validation">Example: model selection via cross-validation</a></li>
+  <li><a href="#model-selection-aka-hyperparameter-tuning">Model selection (a.k.a. hyperparameter tuning)</a></li>
+  <li><a href="#cross-validation">Cross-Validation</a>    <ul>
+      <li><a href="#example-model-selection-via-cross-validation">Example: model selection via cross-validation</a></li>
     </ul>
   </li>
-  <li><a href="#train-validation-split" id="markdown-toc-train-validation-split">Train-Validation Split</a>    <ul>
-      <li><a href="#example-model-selection-via-train-validation-split" id="markdown-toc-example-model-selection-via-train-validation-split">Example: model selection via train validation split</a></li>
+  <li><a href="#train-validation-split">Train-Validation Split</a>    <ul>
+      <li><a href="#example-model-selection-via-train-validation-split">Example: model selection via train validation split</a></li>
     </ul>
   </li>
 </ul>
@@ -396,7 +396,7 @@ However, it is also a well-established method for choosing parameters which is m
 
 Refer to the [`CrossValidator` Scala docs](api/scala/index.html#org.apache.spark.ml.tuning.CrossValidator) for details on the API.
 
-<div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span>
+<div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegression</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.evaluation.BinaryClassificationEvaluator</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.</span><span class="o">{</span><span class="nc">HashingTF</span><span class="o">,</span> <span class="nc">Tokenizer</span><span class="o">}</span>
@@ -467,7 +467,7 @@ Refer to the [`CrossValidator` Scala docs](api/scala/index.html#org.apache.spark
   <span class="o">.</span><span class="n">select</span><span class="o">(</span><span class="s">&quot;id&quot;</span><span class="o">,</span> <span class="s">&quot;text&quot;</span><span class="o">,</span> <span class="s">&quot;probability&quot;</span><span class="o">,</span> <span class="s">&quot;prediction&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="n">collect</span><span class="o">()</span>
   <span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="k">case</span> <span class="nc">Row</span><span class="o">(</span><span class="n">id</span><span class="k">:</span> <span class="kt">Long</span><span class="o">,</span> <span class="n">text</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">prob</span><span class="k">:</span> <span class="kt">Vector</span><span class="o">,</span> <span class="n">prediction</span><span class="k">:</span> <span class="kt">Double</span><span class="o">)</span> <span class="k">=&gt;</span>
-    <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;($id, $text) --&gt; prob=$prob, prediction=$prediction&quot;</span><span class="o">)</span>
+    <span class="n">println</span><span class="o">(</span><span class="s">s&quot;(</span><span class="si">$id</span><span class="s">, </span><span class="si">$text</span><span class="s">) --&gt; prob=</span><span class="si">$prob</span><span class="s">, prediction=</span><span class="si">$prediction</span><span class="s">&quot;</span><span class="o">)</span>
   <span class="o">}</span>
 </pre></div><div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/ml/ModelSelectionViaCrossValidationExample.scala" in the Spark repo.</small></div>
 </div>
@@ -476,7 +476,7 @@ Refer to the [`CrossValidator` Scala docs](api/scala/index.html#org.apache.spark
 
 Refer to the [`CrossValidator` Java docs](api/java/org/apache/spark/ml/tuning/CrossValidator.html) for details on the API.
 
-<div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.PipelineStage</span><span class="o">;</span>
@@ -493,38 +493,38 @@ Refer to the [`CrossValidator` Java docs](api/java/org/apache/spark/ml/tuning/Cr
 
 <span class="c1">// Prepare training documents, which are labeled.</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">training</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">createDataFrame</span><span class="o">(</span><span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">0L</span><span class="o">,</span> <span class="s">&quot;a b c d e spark&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">1L</span><span class="o">,</span> <span class="s">&quot;b d&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">2L</span><span class="o">,</span><span class="s">&quot;spark f g h&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">3L</span><span class="o">,</span> <span class="s">&quot;hadoop mapreduce&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">4L</span><span class="o">,</span> <span class="s">&quot;b spark who&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">5L</span><span class="o">,</span> <span class="s">&quot;g d a y&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">6L</span><span class="o">,</span> <span class="s">&quot;spark fly&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">7L</span><span class="o">,</span> <span class="s">&quot;was mapreduce&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">8L</span><span class="o">,</span> <span class="s">&quot;e spark program&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">9L</span><span class="o">,</span> <span class="s">&quot;a e c l&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">10L</span><span class="o">,</span> <span class="s">&quot;spark compile&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">11L</span><span class="o">,</span> <span class="s">&quot;hadoop software&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">)</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">0</span><span class="n">L</span><span class="o">,</span> <span class="s">&quot;a b c d e spark&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">1L</span><span class="o">,</span> <span class="s">&quot;b d&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">2L</span><span class="o">,</span><span class="s">&quot;spark f g h&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">3L</span><span class="o">,</span> <span class="s">&quot;hadoop mapreduce&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">4L</span><span class="o">,</span> <span class="s">&quot;b spark who&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">5L</span><span class="o">,</span> <span class="s">&quot;g d a y&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">6L</span><span class="o">,</span> <span class="s">&quot;spark fly&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">7L</span><span class="o">,</span> <span class="s">&quot;was mapreduce&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">8L</span><span class="o">,</span> <span class="s">&quot;e spark program&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">9L</span><span class="o">,</span> <span class="s">&quot;a e c l&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">10L</span><span class="o">,</span> <span class="s">&quot;spark compile&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">11L</span><span class="o">,</span> <span class="s">&quot;hadoop software&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">)</span>
 <span class="o">),</span> <span class="n">JavaLabeledDocument</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
 
 <span class="c1">// Configure an ML pipeline, which consists of three stages: tokenizer, hashingTF, and lr.</span>
-<span class="n">Tokenizer</span> <span class="n">tokenizer</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Tokenizer</span><span class="o">()</span>
+<span class="n">Tokenizer</span> <span class="n">tokenizer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Tokenizer</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;text&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;words&quot;</span><span class="o">);</span>
-<span class="n">HashingTF</span> <span class="n">hashingTF</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">HashingTF</span><span class="o">()</span>
+<span class="n">HashingTF</span> <span class="n">hashingTF</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashingTF</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setNumFeatures</span><span class="o">(</span><span class="mi">1000</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="na">getOutputCol</span><span class="o">())</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;features&quot;</span><span class="o">);</span>
-<span class="n">LogisticRegression</span> <span class="n">lr</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LogisticRegression</span><span class="o">()</span>
+<span class="n">LogisticRegression</span> <span class="n">lr</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LogisticRegression</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setMaxIter</span><span class="o">(</span><span class="mi">10</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setRegParam</span><span class="o">(</span><span class="mf">0.01</span><span class="o">);</span>
-<span class="n">Pipeline</span> <span class="n">pipeline</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Pipeline</span><span class="o">()</span>
+<span class="n">Pipeline</span> <span class="n">pipeline</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Pipeline</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setStages</span><span class="o">(</span><span class="k">new</span> <span class="n">PipelineStage</span><span class="o">[]</span> <span class="o">{</span><span class="n">tokenizer</span><span class="o">,</span> <span class="n">hashingTF</span><span class="o">,</span> <span class="n">lr</span><span class="o">});</span>
 
 <span class="c1">// We use a ParamGridBuilder to construct a grid of parameters to search over.</span>
 <span class="c1">// With 3 values for hashingTF.numFeatures and 2 values for lr.regParam,</span>
 <span class="c1">// this grid will have 3 x 2 = 6 parameter settings for CrossValidator to choose from.</span>
-<span class="n">ParamMap</span><span class="o">[]</span> <span class="n">paramGrid</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">ParamGridBuilder</span><span class="o">()</span>
+<span class="n">ParamMap</span><span class="o">[]</span> <span class="n">paramGrid</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ParamGridBuilder</span><span class="o">()</span>
   <span class="o">.</span><span class="na">addGrid</span><span class="o">(</span><span class="n">hashingTF</span><span class="o">.</span><span class="na">numFeatures</span><span class="o">(),</span> <span class="k">new</span> <span class="kt">int</span><span class="o">[]</span> <span class="o">{</span><span class="mi">10</span><span class="o">,</span> <span class="mi">100</span><span class="o">,</span> <span class="mi">1000</span><span class="o">})</span>
   <span class="o">.</span><span class="na">addGrid</span><span class="o">(</span><span class="n">lr</span><span class="o">.</span><span class="na">regParam</span><span class="o">(),</span> <span class="k">new</span> <span class="kt">double</span><span class="o">[]</span> <span class="o">{</span><span class="mf">0.1</span><span class="o">,</span> <span class="mf">0.01</span><span class="o">})</span>
   <span class="o">.</span><span class="na">build</span><span class="o">();</span>
@@ -534,9 +534,9 @@ Refer to the [`CrossValidator` Java docs](api/java/org/apache/spark/ml/tuning/Cr
 <span class="c1">// A CrossValidator requires an Estimator, a set of Estimator ParamMaps, and an Evaluator.</span>
 <span class="c1">// Note that the evaluator here is a BinaryClassificationEvaluator and its default metric</span>
 <span class="c1">// is areaUnderROC.</span>
-<span class="n">CrossValidator</span> <span class="n">cv</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">CrossValidator</span><span class="o">()</span>
+<span class="n">CrossValidator</span> <span class="n">cv</span> <span class="o">=</span> <span class="k">new</span> <span class="n">CrossValidator</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setEstimator</span><span class="o">(</span><span class="n">pipeline</span><span class="o">)</span>
-  <span class="o">.</span><span class="na">setEvaluator</span><span class="o">(</span><span class="k">new</span> <span class="nf">BinaryClassificationEvaluator</span><span class="o">())</span>
+  <span class="o">.</span><span class="na">setEvaluator</span><span class="o">(</span><span class="k">new</span> <span class="n">BinaryClassificationEvaluator</span><span class="o">())</span>
   <span class="o">.</span><span class="na">setEstimatorParamMaps</span><span class="o">(</span><span class="n">paramGrid</span><span class="o">).</span><span class="na">setNumFolds</span><span class="o">(</span><span class="mi">2</span><span class="o">);</span>  <span class="c1">// Use 3+ in practice</span>
 
 <span class="c1">// Run cross-validation, and choose the best set of parameters.</span>
@@ -544,10 +544,10 @@ Refer to the [`CrossValidator` Java docs](api/java/org/apache/spark/ml/tuning/Cr
 
 <span class="c1">// Prepare test documents, which are unlabeled.</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">test</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">createDataFrame</span><span class="o">(</span><span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span>
-  <span class="k">new</span> <span class="nf">JavaDocument</span><span class="o">(</span><span class="mi">4L</span><span class="o">,</span> <span class="s">&quot;spark i j k&quot;</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaDocument</span><span class="o">(</span><span class="mi">5L</span><span class="o">,</span> <span class="s">&quot;l m n&quot;</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaDocument</span><span class="o">(</span><span class="mi">6L</span><span class="o">,</span> <span class="s">&quot;mapreduce spark&quot;</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaDocument</span><span class="o">(</span><span class="mi">7L</span><span class="o">,</span> <span class="s">&quot;apache hadoop&quot;</span><span class="o">)</span>
+  <span class="k">new</span> <span class="n">JavaDocument</span><span class="o">(</span><span class="mi">4L</span><span class="o">,</span> <span class="s">&quot;spark i j k&quot;</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaDocument</span><span class="o">(</span><span class="mi">5L</span><span class="o">,</span> <span class="s">&quot;l m n&quot;</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaDocument</span><span class="o">(</span><span class="mi">6L</span><span class="o">,</span> <span class="s">&quot;mapreduce spark&quot;</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaDocument</span><span class="o">(</span><span class="mi">7L</span><span class="o">,</span> <span class="s">&quot;apache hadoop&quot;</span><span class="o">)</span>
 <span class="o">),</span> <span class="n">JavaDocument</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
 
 <span class="c1">// Make predictions on test documents. cvModel uses the best model found (lrModel).</span>
@@ -563,40 +563,40 @@ Refer to the [`CrossValidator` Java docs](api/java/org/apache/spark/ml/tuning/Cr
 
 Refer to the [`CrossValidator` Python docs](api/python/pyspark.ml.html#pyspark.ml.tuning.CrossValidator) for more details on the API.
 
-<div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml</span> <span class="kn">import</span> <span class="n">Pipeline</span>
+<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml</span> <span class="kn">import</span> <span class="n">Pipeline</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.classification</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.evaluation</span> <span class="kn">import</span> <span class="n">BinaryClassificationEvaluator</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">HashingTF</span><span class="p">,</span> <span class="n">Tokenizer</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.tuning</span> <span class="kn">import</span> <span class="n">CrossValidator</span><span class="p">,</span> <span class="n">ParamGridBuilder</span>
 
-<span class="c"># Prepare training documents, which are labeled.</span>
+<span class="c1"># Prepare training documents, which are labeled.</span>
 <span class="n">training</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([</span>
-    <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="s">&quot;a b c d e spark&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s">&quot;b d&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s">&quot;spark f g h&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s">&quot;hadoop mapreduce&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s">&quot;b spark who&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="s">&quot;g d a y&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="s">&quot;spark fly&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">7</span><span class="p">,</span> <span class="s">&quot;was mapreduce&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="s">&quot;e spark program&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">9</span><span class="p">,</span> <span class="s">&quot;a e c l&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="s">&quot;spark compile&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">11</span><span class="p">,</span> <span class="s">&quot;hadoop software&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">)</span>
-<span class="p">],</span> <span class="p">[</span><span class="s">&quot;id&quot;</span><span class="p">,</span> <span class="s">&quot;text&quot;</span><span class="p">,</span> <span class="s">&quot;label&quot;</span><span class="p">])</span>
-
-<span class="c"># Configure an ML pipeline, which consists of tree stages: tokenizer, hashingTF, and lr.</span>
-<span class="n">tokenizer</span> <span class="o">=</span> <span class="n">Tokenizer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;text&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;words&quot;</span><span class="p">)</span>
-<span class="n">hashingTF</span> <span class="o">=</span> <span class="n">HashingTF</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">getOutputCol</span><span class="p">(),</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;features&quot;</span><span class="p">)</span>
+    <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="s2">&quot;a b c d e spark&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s2">&quot;b d&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s2">&quot;spark f g h&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s2">&quot;hadoop mapreduce&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s2">&quot;b spark who&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="s2">&quot;g d a y&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="s2">&quot;spark fly&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">7</span><span class="p">,</span> <span class="s2">&quot;was mapreduce&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="s2">&quot;e spark program&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">9</span><span class="p">,</span> <span class="s2">&quot;a e c l&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="s2">&quot;spark compile&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">11</span><span class="p">,</span> <span class="s2">&quot;hadoop software&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">)</span>
+<span class="p">],</span> <span class="p">[</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="s2">&quot;text&quot;</span><span class="p">,</span> <span class="s2">&quot;label&quot;</span><span class="p">])</span>
+
+<span class="c1"># Configure an ML pipeline, which consists of tree stages: tokenizer, hashingTF, and lr.</span>
+<span class="n">tokenizer</span> <span class="o">=</span> <span class="n">Tokenizer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;text&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;words&quot;</span><span class="p">)</span>
+<span class="n">hashingTF</span> <span class="o">=</span> <span class="n">HashingTF</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">getOutputCol</span><span class="p">(),</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;features&quot;</span><span class="p">)</span>
 <span class="n">lr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">(</span><span class="n">maxIter</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
 <span class="n">pipeline</span> <span class="o">=</span> <span class="n">Pipeline</span><span class="p">(</span><span class="n">stages</span><span class="o">=</span><span class="p">[</span><span class="n">tokenizer</span><span class="p">,</span> <span class="n">hashingTF</span><span class="p">,</span> <span class="n">lr</span><span class="p">])</span>
 
-<span class="c"># We now treat the Pipeline as an Estimator, wrapping it in a CrossValidator instance.</span>
-<span class="c"># This will allow us to jointly choose parameters for all Pipeline stages.</span>
-<span class="c"># A CrossValidator requires an Estimator, a set of Estimator ParamMaps, and an Evaluator.</span>
-<span class="c"># We use a ParamGridBuilder to construct a grid of parameters to search over.</span>
-<span class="c"># With 3 values for hashingTF.numFeatures and 2 values for lr.regParam,</span>
-<span class="c"># this grid will have 3 x 2 = 6 parameter settings for CrossValidator to choose from.</span>
+<span class="c1"># We now treat the Pipeline as an Estimator, wrapping it in a CrossValidator instance.</span>
+<span class="c1"># This will allow us to jointly choose parameters for all Pipeline stages.</span>
+<span class="c1"># A CrossValidator requires an Estimator, a set of Estimator ParamMaps, and an Evaluator.</span>
+<span class="c1"># We use a ParamGridBuilder to construct a grid of parameters to search over.</span>
+<span class="c1"># With 3 values for hashingTF.numFeatures and 2 values for lr.regParam,</span>
+<span class="c1"># this grid will have 3 x 2 = 6 parameter settings for CrossValidator to choose from.</span>
 <span class="n">paramGrid</span> <span class="o">=</span> <span class="n">ParamGridBuilder</span><span class="p">()</span> \
     <span class="o">.</span><span class="n">addGrid</span><span class="p">(</span><span class="n">hashingTF</span><span class="o">.</span><span class="n">numFeatures</span><span class="p">,</span> <span class="p">[</span><span class="mi">10</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">1000</span><span class="p">])</span> \
     <span class="o">.</span><span class="n">addGrid</span><span class="p">(</span><span class="n">lr</span><span class="o">.</span><span class="n">regParam</span><span class="p">,</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.01</span><span class="p">])</span> \
@@ -605,22 +605,22 @@ Refer to the [`CrossValidator` Python docs](api/python/pyspark.ml.html#pyspark.m
 <span class="n">crossval</span> <span class="o">=</span> <span class="n">CrossValidator</span><span class="p">(</span><span class="n">estimator</span><span class="o">=</span><span class="n">pipeline</span><span class="p">,</span>
                           <span class="n">estimatorParamMaps</span><span class="o">=</span><span class="n">paramGrid</span><span class="p">,</span>
                           <span class="n">evaluator</span><span class="o">=</span><span class="n">BinaryClassificationEvaluator</span><span class="p">(),</span>
-                          <span class="n">numFolds</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>  <span class="c"># use 3+ folds in practice</span>
+                          <span class="n">numFolds</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>  <span class="c1"># use 3+ folds in practice</span>
 
-<span class="c"># Run cross-validation, and choose the best set of parameters.</span>
+<span class="c1"># Run cross-validation, and choose the best set of parameters.</span>
 <span class="n">cvModel</span> <span class="o">=</span> <span class="n">crossval</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
 
-<span class="c"># Prepare test documents, which are unlabeled.</span>
+<span class="c1"># Prepare test documents, which are unlabeled.</span>
 <span class="n">test</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([</span>
-    <span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s">&quot;spark i j k&quot;</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="s">&quot;l m n&quot;</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="s">&quot;mapreduce spark&quot;</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">7</span><span class="p">,</span> <span class="s">&quot;apache hadoop&quot;</span><span class="p">)</span>
-<span class="p">],</span> <span class="p">[</span><span class="s">&quot;id&quot;</span><span class="p">,</span> <span class="s">&quot;text&quot;</span><span class="p">])</span>
+    <span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s2">&quot;spark i j k&quot;</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="s2">&quot;l m n&quot;</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="s2">&quot;mapreduce spark&quot;</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">7</span><span class="p">,</span> <span class="s2">&quot;apache hadoop&quot;</span><span class="p">)</span>
+<span class="p">],</span> <span class="p">[</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="s2">&quot;text&quot;</span><span class="p">])</span>
 
-<span class="c"># Make predictions on test documents. cvModel uses the best model found (lrModel).</span>
+<span class="c1"># Make predictions on test documents. cvModel uses the best model found (lrModel).</span>
 <span class="n">prediction</span> <span class="o">=</span> <span class="n">cvModel</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>
-<span class="n">selected</span> <span class="o">=</span> <span class="n">prediction</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&quot;id&quot;</span><span class="p">,</span> <span class="s">&quot;text&quot;</span><span class="p">,</span> <span class="s">&quot;probability&quot;</span><span class="p">,</span> <span class="s">&quot;prediction&quot;</span><span class="p">)</span>
+<span class="n">selected</span> <span class="o">=</span> <span class="n">prediction</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="s2">&quot;text&quot;</span><span class="p">,</span> <span class="s2">&quot;probability&quot;</span><span class="p">,</span> <span class="s2">&quot;prediction&quot;</span><span class="p">)</span>
 <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">selected</span><span class="o">.</span><span class="n">collect</span><span class="p">():</span>
     <span class="k">print</span><span class="p">(</span><span class="n">row</span><span class="p">)</span>
 </pre></div><div><small>Find full example code at "examples/src/main/python/ml/cross_validator.py" in the Spark repo.</small></div>
@@ -649,7 +649,7 @@ It splits the dataset into these two parts using the <code>trainRatio</code> par
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.ml.tuning.TrainValidationSplit"><code>TrainValidationSplit</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.evaluation.RegressionEvaluator</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.evaluation.RegressionEvaluator</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.regression.LinearRegression</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.tuning.</span><span class="o">{</span><span class="nc">ParamGridBuilder</span><span class="o">,</span> <span class="nc">TrainValidationSplit</span><span class="o">}</span>
 
@@ -694,7 +694,7 @@ It splits the dataset into these two parts using the <code>trainRatio</code> par
 
     <p>Refer to the <a href="api/java/org/apache/spark/ml/tuning/TrainValidationSplit.html"><code>TrainValidationSplit</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.ml.evaluation.RegressionEvaluator</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.ml.evaluation.RegressionEvaluator</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.param.ParamMap</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.regression.LinearRegression</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.tuning.ParamGridBuilder</span><span class="o">;</span>
@@ -711,12 +711,12 @@ It splits the dataset into these two parts using the <code>trainRatio</code> par
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">training</span> <span class="o">=</span> <span class="n">splits</span><span class="o">[</span><span class="mi">0</span><span class="o">];</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">test</span> <span class="o">=</span> <span class="n">splits</span><span class="o">[</span><span class="mi">1</span><span class="o">];</span>
 
-<span class="n">LinearRegression</span> <span class="n">lr</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LinearRegression</span><span class="o">();</span>
+<span class="n">LinearRegression</span> <span class="n">lr</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LinearRegression</span><span class="o">();</span>
 
 <span class="c1">// We use a ParamGridBuilder to construct a grid of parameters to search over.</span>
 <span class="c1">// TrainValidationSplit will try all combinations of values and determine best model using</span>
 <span class="c1">// the evaluator.</span>
-<span class="n">ParamMap</span><span class="o">[]</span> <span class="n">paramGrid</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">ParamGridBuilder</span><span class="o">()</span>
+<span class="n">ParamMap</span><span class="o">[]</span> <span class="n">paramGrid</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ParamGridBuilder</span><span class="o">()</span>
   <span class="o">.</span><span class="na">addGrid</span><span class="o">(</span><span class="n">lr</span><span class="o">.</span><span class="na">regParam</span><span class="o">(),</span> <span class="k">new</span> <span class="kt">double</span><span class="o">[]</span> <span class="o">{</span><span class="mf">0.1</span><span class="o">,</span> <span class="mf">0.01</span><span class="o">})</span>
   <span class="o">.</span><span class="na">addGrid</span><span class="o">(</span><span class="n">lr</span><span class="o">.</span><span class="na">fitIntercept</span><span class="o">())</span>
   <span class="o">.</span><span class="na">addGrid</span><span class="o">(</span><span class="n">lr</span><span class="o">.</span><span class="na">elasticNetParam</span><span class="o">(),</span> <span class="k">new</span> <span class="kt">double</span><span class="o">[]</span> <span class="o">{</span><span class="mf">0.0</span><span class="o">,</span> <span class="mf">0.5</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">})</span>
@@ -724,9 +724,9 @@ It splits the dataset into these two parts using the <code>trainRatio</code> par
 
 <span class="c1">// In this case the estimator is simply the linear regression.</span>
 <span class="c1">// A TrainValidationSplit requires an Estimator, a set of Estimator ParamMaps, and an Evaluator.</span>
-<span class="n">TrainValidationSplit</span> <span class="n">trainValidationSplit</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">TrainValidationSplit</span><span class="o">()</span>
+<span class="n">TrainValidationSplit</span> <span class="n">trainValidationSplit</span> <span class="o">=</span> <span class="k">new</span> <span class="n">TrainValidationSplit</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setEstimator</span><span class="o">(</span><span class="n">lr</span><span class="o">)</span>
-  <span class="o">.</span><span class="na">setEvaluator</span><span class="o">(</span><span class="k">new</span> <span class="nf">RegressionEvaluator</span><span class="o">())</span>
+  <span class="o">.</span><span class="na">setEvaluator</span><span class="o">(</span><span class="k">new</span> <span class="n">RegressionEvaluator</span><span class="o">())</span>
   <span class="o">.</span><span class="na">setEstimatorParamMaps</span><span class="o">(</span><span class="n">paramGrid</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setTrainRatio</span><span class="o">(</span><span class="mf">0.8</span><span class="o">);</span>  <span class="c1">// 80% for training and the remaining 20% for validation</span>
 
@@ -746,41 +746,41 @@ It splits the dataset into these two parts using the <code>trainRatio</code> par
 
 Refer to the [`TrainValidationSplit` Python docs](api/python/pyspark.ml.html#pyspark.ml.tuning.TrainValidationSplit) for more details on the API.
 
-<div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.evaluation</span> <span class="kn">import</span> <span class="n">RegressionEvaluator</span>
+<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.evaluation</span> <span class="kn">import</span> <span class="n">RegressionEvaluator</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.regression</span> <span class="kn">import</span> <span class="n">LinearRegression</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.tuning</span> <span class="kn">import</span> <span class="n">ParamGridBuilder</span><span class="p">,</span> <span class="n">TrainValidationSplit</span>
 
-<span class="c"># Prepare training and test data.</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;libsvm&quot;</span><span class="p">)</span>\
-    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_linear_regression_data.txt&quot;</span><span class="p">)</span>
+<span class="c1"># Prepare training and test data.</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;libsvm&quot;</span><span class="p">)</span>\
+    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_linear_regression_data.txt&quot;</span><span class="p">)</span>
 <span class="n">train</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.9</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">],</span> <span class="n">seed</span><span class="o">=</span><span class="mi">12345</span><span class="p">)</span>
 
 <span class="n">lr</span> <span class="o">=</span> <span class="n">LinearRegression</span><span class="p">(</span><span class="n">maxIter</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
 
-<span class="c"># We use a ParamGridBuilder to construct a grid of parameters to search over.</span>
-<span class="c"># TrainValidationSplit will try all combinations of values and determine best model using</span>
-<span class="c"># the evaluator.</span>
+<span class="c1"># We use a ParamGridBuilder to construct a grid of parameters to search over.</span>
+<span class="c1"># TrainValidationSplit will try all combinations of values and determine best model using</span>
+<span class="c1"># the evaluator.</span>
 <span class="n">paramGrid</span> <span class="o">=</span> <span class="n">ParamGridBuilder</span><span class="p">()</span>\
     <span class="o">.</span><span class="n">addGrid</span><span class="p">(</span><span class="n">lr</span><span class="o">.</span><span class="n">regParam</span><span class="p">,</span> <span class="p">[</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.01</span><span class="p">])</span> \
     <span class="o">.</span><span class="n">addGrid</span><span class="p">(</span><span class="n">lr</span><span class="o">.</span><span class="n">fitIntercept</span><span class="p">,</span> <span class="p">[</span><span class="bp">False</span><span class="p">,</span> <span class="bp">True</span><span class="p">])</span>\
     <span class="o">.</span><span class="n">addGrid</span><span class="p">(</span><span class="n">lr</span><span class="o">.</span><span class="n">elasticNetParam</span><span class="p">,</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">])</span>\
     <span class="o">.</span><span class="n">build</span><span class="p">()</span>
 
-<span class="c"># In this case the estimator is simply the linear regression.</span>
-<span class="c"># A TrainValidationSplit requires an Estimator, a set of Estimator ParamMaps, and an Evaluator.</span>
+<span class="c1"># In this case the estimator is simply the linear regression.</span>
+<span class="c1"># A TrainValidationSplit requires an Estimator, a set of Estimator ParamMaps, and an Evaluator.</span>
 <span class="n">tvs</span> <span class="o">=</span> <span class="n">TrainValidationSplit</span><span class="p">(</span><span class="n">estimator</span><span class="o">=</span><span class="n">lr</span><span class="p">,</span>
                            <span class="n">estimatorParamMaps</span><span class="o">=</span><span class="n">paramGrid</span><span class="p">,</span>
                            <span class="n">evaluator</span><span class="o">=</span><span class="n">RegressionEvaluator</span><span class="p">(),</span>
-                           <span class="c"># 80% of the data will be used for training, 20% for validation.</span>
+                           <span class="c1"># 80% of the data will be used for training, 20% for validation.</span>
                            <span class="n">trainRatio</span><span class="o">=</span><span class="mf">0.8</span><span class="p">)</span>
 
-<span class="c"># Run TrainValidationSplit, and choose the best set of parameters.</span>
+<span class="c1"># Run TrainValidationSplit, and choose the best set of parameters.</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">tvs</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">train</span><span class="p">)</span>
 
-<span class="c"># Make predictions on test data. model is the model with combination of parameters</span>
-<span class="c"># that performed best.</span>
+<span class="c1"># Make predictions on test data. model is the model with combination of parameters</span>
+<span class="c1"># that performed best.</span>
 <span class="n">model</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>\
-    <span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&quot;features&quot;</span><span class="p">,</span> <span class="s">&quot;label&quot;</span><span class="p">,</span> <span class="s">&quot;prediction&quot;</span><span class="p">)</span>\
+    <span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">&quot;features&quot;</span><span class="p">,</span> <span class="s2">&quot;label&quot;</span><span class="p">,</span> <span class="s2">&quot;prediction&quot;</span><span class="p">)</span>\
     <span class="o">.</span><span class="n">show</span><span class="p">()</span>
 </pre></div><div><small>Find full example code at "examples/src/main/python/ml/train_validation_split.py" in the Spark repo.</small></div>
 </div>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[09/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/programming-guide.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/programming-guide.html b/site/docs/2.1.0/programming-guide.html
index 12458af..0e06e86 100644
--- a/site/docs/2.1.0/programming-guide.html
+++ b/site/docs/2.1.0/programming-guide.html
@@ -129,50 +129,50 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#overview" id="markdown-toc-overview">Overview</a></li>
-  <li><a href="#linking-with-spark" id="markdown-toc-linking-with-spark">Linking with Spark</a></li>
-  <li><a href="#initializing-spark" id="markdown-toc-initializing-spark">Initializing Spark</a>    <ul>
-      <li><a href="#using-the-shell" id="markdown-toc-using-the-shell">Using the Shell</a></li>
+  <li><a href="#overview">Overview</a></li>
+  <li><a href="#linking-with-spark">Linking with Spark</a></li>
+  <li><a href="#initializing-spark">Initializing Spark</a>    <ul>
+      <li><a href="#using-the-shell">Using the Shell</a></li>
     </ul>
   </li>
-  <li><a href="#resilient-distributed-datasets-rdds" id="markdown-toc-resilient-distributed-datasets-rdds">Resilient Distributed Datasets (RDDs)</a>    <ul>
-      <li><a href="#parallelized-collections" id="markdown-toc-parallelized-collections">Parallelized Collections</a></li>
-      <li><a href="#external-datasets" id="markdown-toc-external-datasets">External Datasets</a></li>
-      <li><a href="#rdd-operations" id="markdown-toc-rdd-operations">RDD Operations</a>        <ul>
-          <li><a href="#basics" id="markdown-toc-basics">Basics</a></li>
-          <li><a href="#passing-functions-to-spark" id="markdown-toc-passing-functions-to-spark">Passing Functions to Spark</a></li>
-          <li><a href="#understanding-closures-a-nameclosureslinka" id="markdown-toc-understanding-closures-a-nameclosureslinka">Understanding closures <a name="ClosuresLink"></a></a>            <ul>
-              <li><a href="#example" id="markdown-toc-example">Example</a></li>
-              <li><a href="#local-vs-cluster-modes" id="markdown-toc-local-vs-cluster-modes">Local vs. cluster modes</a></li>
-              <li><a href="#printing-elements-of-an-rdd" id="markdown-toc-printing-elements-of-an-rdd">Printing elements of an RDD</a></li>
+  <li><a href="#resilient-distributed-datasets-rdds">Resilient Distributed Datasets (RDDs)</a>    <ul>
+      <li><a href="#parallelized-collections">Parallelized Collections</a></li>
+      <li><a href="#external-datasets">External Datasets</a></li>
+      <li><a href="#rdd-operations">RDD Operations</a>        <ul>
+          <li><a href="#basics">Basics</a></li>
+          <li><a href="#passing-functions-to-spark">Passing Functions to Spark</a></li>
+          <li><a href="#understanding-closures-a-nameclosureslinka">Understanding closures <a name="ClosuresLink"></a></a>            <ul>
+              <li><a href="#example">Example</a></li>
+              <li><a href="#local-vs-cluster-modes">Local vs. cluster modes</a></li>
+              <li><a href="#printing-elements-of-an-rdd">Printing elements of an RDD</a></li>
             </ul>
           </li>
-          <li><a href="#working-with-key-value-pairs" id="markdown-toc-working-with-key-value-pairs">Working with Key-Value Pairs</a></li>
-          <li><a href="#transformations" id="markdown-toc-transformations">Transformations</a></li>
-          <li><a href="#actions" id="markdown-toc-actions">Actions</a></li>
-          <li><a href="#shuffle-operations" id="markdown-toc-shuffle-operations">Shuffle operations</a>            <ul>
-              <li><a href="#background" id="markdown-toc-background">Background</a></li>
-              <li><a href="#performance-impact" id="markdown-toc-performance-impact">Performance Impact</a></li>
+          <li><a href="#working-with-key-value-pairs">Working with Key-Value Pairs</a></li>
+          <li><a href="#transformations">Transformations</a></li>
+          <li><a href="#actions">Actions</a></li>
+          <li><a href="#shuffle-operations">Shuffle operations</a>            <ul>
+              <li><a href="#background">Background</a></li>
+              <li><a href="#performance-impact">Performance Impact</a></li>
             </ul>
           </li>
         </ul>
       </li>
-      <li><a href="#rdd-persistence" id="markdown-toc-rdd-persistence">RDD Persistence</a>        <ul>
-          <li><a href="#which-storage-level-to-choose" id="markdown-toc-which-storage-level-to-choose">Which Storage Level to Choose?</a></li>
-          <li><a href="#removing-data" id="markdown-toc-removing-data">Removing Data</a></li>
+      <li><a href="#rdd-persistence">RDD Persistence</a>        <ul>
+          <li><a href="#which-storage-level-to-choose">Which Storage Level to Choose?</a></li>
+          <li><a href="#removing-data">Removing Data</a></li>
         </ul>
       </li>
     </ul>
   </li>
-  <li><a href="#shared-variables" id="markdown-toc-shared-variables">Shared Variables</a>    <ul>
-      <li><a href="#broadcast-variables" id="markdown-toc-broadcast-variables">Broadcast Variables</a></li>
-      <li><a href="#accumulators" id="markdown-toc-accumulators">Accumulators</a></li>
+  <li><a href="#shared-variables">Shared Variables</a>    <ul>
+      <li><a href="#broadcast-variables">Broadcast Variables</a></li>
+      <li><a href="#accumulators">Accumulators</a></li>
     </ul>
   </li>
-  <li><a href="#deploying-to-a-cluster" id="markdown-toc-deploying-to-a-cluster">Deploying to a Cluster</a></li>
-  <li><a href="#launching-spark-jobs-from-java--scala" id="markdown-toc-launching-spark-jobs-from-java--scala">Launching Spark jobs from Java / Scala</a></li>
-  <li><a href="#unit-testing" id="markdown-toc-unit-testing">Unit Testing</a></li>
-  <li><a href="#where-to-go-from-here" id="markdown-toc-where-to-go-from-here">Where to Go from Here</a></li>
+  <li><a href="#deploying-to-a-cluster">Deploying to a Cluster</a></li>
+  <li><a href="#launching-spark-jobs-from-java--scala">Launching Spark jobs from Java / Scala</a></li>
+  <li><a href="#unit-testing">Unit Testing</a></li>
+  <li><a href="#where-to-go-from-here">Where to Go from Here</a></li>
 </ul>
 
 <h1 id="overview">Overview</h1>
@@ -212,8 +212,8 @@ version = &lt;your-hdfs-version&gt;
 
     <p>Finally, you need to import some Spark classes into your program. Add the following lines:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.SparkContext</span>
-<span class="k">import</span> <span class="nn">org.apache.spark.SparkConf</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.SparkContext</span>
+<span class="k">import</span> <span class="nn">org.apache.spark.SparkConf</span></code></pre></figure>
 
     <p>(Before Spark 1.3.0, you need to explicitly <code>import org.apache.spark.SparkContext._</code> to enable essential implicit conversions.)</p>
 
@@ -245,9 +245,9 @@ version = &lt;your-hdfs-version&gt;
 
     <p>Finally, you need to import some Spark classes into your program. Add the following lines:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.api.java.JavaSparkContext</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.api.java.JavaSparkContext</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span>
-<span class="k">import</span> <span class="nn">org.apache.spark.SparkConf</span></code></pre></div>
+<span class="k">import</span> <span class="nn">org.apache.spark.SparkConf</span></code></pre></figure>
 
   </div>
 
@@ -269,13 +269,13 @@ for common HDFS versions.</p>
 
     <p>Finally, you need to import some Spark classes into your program. Add the following line:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pyspark</span> <span class="kn">import</span> <span class="n">SparkContext</span><span class="p">,</span> <span class="n">SparkConf</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="kn">from</span> <span class="nn">pyspark</span> <span class="kn">import</span> <span class="n">SparkContext</span><span class="p">,</span> <span class="n">SparkConf</span></code></pre></figure>
 
     <p>PySpark requires the same minor version of Python in both driver and workers. It uses the default python version in PATH,
 you can specify which version of Python you want to use by <code>PYSPARK_PYTHON</code>, for example:</p>
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ PYSPARK_PYTHON</span><span class="o">=</span>python3.4 bin/pyspark
-<span class="nv">$ PYSPARK_PYTHON</span><span class="o">=</span>/opt/pypy-2.5/bin/pypy bin/spark-submit examples/src/main/python/pi.py</code></pre></div>
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ <span class="nv">PYSPARK_PYTHON</span><span class="o">=</span>python3.4 bin/pyspark
+$ <span class="nv">PYSPARK_PYTHON</span><span class="o">=</span>/opt/pypy-2.5/bin/pypy bin/spark-submit examples/src/main/python/pi.py</code></pre></figure>
 
   </div>
 
@@ -293,8 +293,8 @@ that contains information about your application.</p>
 
     <p>Only one SparkContext may be active per JVM.  You must <code>stop()</code> the active SparkContext before creating a new one.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">().</span><span class="n">setAppName</span><span class="o">(</span><span class="n">appName</span><span class="o">).</span><span class="n">setMaster</span><span class="o">(</span><span class="n">master</span><span class="o">)</span>
-<span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">().</span><span class="n">setAppName</span><span class="o">(</span><span class="n">appName</span><span class="o">).</span><span class="n">setMaster</span><span class="o">(</span><span class="n">master</span><span class="o">)</span>
+<span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></figure>
 
   </div>
 
@@ -304,8 +304,8 @@ that contains information about your application.</p>
 how to access a cluster. To create a <code>SparkContext</code> you first need to build a <a href="api/java/index.html?org/apache/spark/SparkConf.html">SparkConf</a> object
 that contains information about your application.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">SparkConf</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="n">appName</span><span class="o">).</span><span class="na">setMaster</span><span class="o">(</span><span class="n">master</span><span class="o">);</span>
-<span class="n">JavaSparkContext</span> <span class="n">sc</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JavaSparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">);</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">SparkConf</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="n">appName</span><span class="o">).</span><span class="na">setMaster</span><span class="o">(</span><span class="n">master</span><span class="o">);</span>
+<span class="n">JavaSparkContext</span> <span class="n">sc</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaSparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">);</span></code></pre></figure>
 
   </div>
 
@@ -315,8 +315,8 @@ that contains information about your application.</p>
 how to access a cluster. To create a <code>SparkContext</code> you first need to build a <a href="api/python/pyspark.html#pyspark.SparkConf">SparkConf</a> object
 that contains information about your application.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">conf</span> <span class="o">=</span> <span class="n">SparkConf</span><span class="p">()</span><span class="o">.</span><span class="n">setAppName</span><span class="p">(</span><span class="n">appName</span><span class="p">)</span><span class="o">.</span><span class="n">setMaster</span><span class="p">(</span><span class="n">master</span><span class="p">)</span>
-<span class="n">sc</span> <span class="o">=</span> <span class="n">SparkContext</span><span class="p">(</span><span class="n">conf</span><span class="o">=</span><span class="n">conf</span><span class="p">)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">conf</span> <span class="o">=</span> <span class="n">SparkConf</span><span class="p">()</span><span class="o">.</span><span class="n">setAppName</span><span class="p">(</span><span class="n">appName</span><span class="p">)</span><span class="o">.</span><span class="n">setMaster</span><span class="p">(</span><span class="n">master</span><span class="p">)</span>
+<span class="n">sc</span> <span class="o">=</span> <span class="n">SparkContext</span><span class="p">(</span><span class="n">conf</span><span class="o">=</span><span class="n">conf</span><span class="p">)</span></code></pre></figure>
 
   </div>
 
@@ -345,15 +345,15 @@ to the <code>--packages</code> argument. Any additional repositories where depen
 can be passed to the <code>--repositories</code> argument. For example, to run <code>bin/spark-shell</code> on exactly
 four cores, use:</p>
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./bin/spark-shell --master <span class="nb">local</span><span class="o">[</span>4<span class="o">]</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ ./bin/spark-shell --master local<span class="o">[</span><span class="m">4</span><span class="o">]</span></code></pre></figure>
 
     <p>Or, to also add <code>code.jar</code> to its classpath, use:</p>
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./bin/spark-shell --master <span class="nb">local</span><span class="o">[</span>4<span class="o">]</span> --jars code.jar</code></pre></div>
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ ./bin/spark-shell --master local<span class="o">[</span><span class="m">4</span><span class="o">]</span> --jars code.jar</code></pre></figure>
 
     <p>To include a dependency using maven coordinates:</p>
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./bin/spark-shell --master <span class="nb">local</span><span class="o">[</span>4<span class="o">]</span> --packages <span class="s2">&quot;org.example:example:0.1&quot;</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ ./bin/spark-shell --master local<span class="o">[</span><span class="m">4</span><span class="o">]</span> --packages <span class="s2">&quot;org.example:example:0.1&quot;</span></code></pre></figure>
 
     <p>For a complete list of options, run <code>spark-shell --help</code>. Behind the scenes,
 <code>spark-shell</code> invokes the more general <a href="submitting-applications.html"><code>spark-submit</code> script</a>.</p>
@@ -372,11 +372,11 @@ can be passed to the <code>--repositories</code> argument. Any Python dependenci
 the requirements.txt of that package) must be manually installed using <code>pip</code> when necessary.
 For example, to run <code>bin/pyspark</code> on exactly four cores, use:</p>
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./bin/pyspark --master <span class="nb">local</span><span class="o">[</span>4<span class="o">]</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ ./bin/pyspark --master local<span class="o">[</span><span class="m">4</span><span class="o">]</span></code></pre></figure>
 
     <p>Or, to also add <code>code.py</code> to the search path (in order to later be able to <code>import code</code>), use:</p>
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./bin/pyspark --master <span class="nb">local</span><span class="o">[</span>4<span class="o">]</span> --py-files code.py</code></pre></div>
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ ./bin/pyspark --master local<span class="o">[</span><span class="m">4</span><span class="o">]</span> --py-files code.py</code></pre></figure>
 
     <p>For a complete list of options, run <code>pyspark --help</code>. Behind the scenes,
 <code>pyspark</code> invokes the more general <a href="submitting-applications.html"><code>spark-submit</code> script</a>.</p>
@@ -385,13 +385,13 @@ For example, to run <code>bin/pyspark</code> on exactly four cores, use:</p>
 enhanced Python interpreter. PySpark works with IPython 1.0.0 and later. To
 use IPython, set the <code>PYSPARK_DRIVER_PYTHON</code> variable to <code>ipython</code> when running <code>bin/pyspark</code>:</p>
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ PYSPARK_DRIVER_PYTHON</span><span class="o">=</span>ipython ./bin/pyspark</code></pre></div>
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ <span class="nv">PYSPARK_DRIVER_PYTHON</span><span class="o">=</span>ipython ./bin/pyspark</code></pre></figure>
 
-    <p>To use the Jupyter notebook (previously known as the IPython notebook),</p>
+    <p>To use the Jupyter notebook (previously known as the IPython notebook), </p>
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ PYSPARK_DRIVER_PYTHON</span><span class="o">=</span>jupyter ./bin/pyspark</code></pre></div>
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ <span class="nv">PYSPARK_DRIVER_PYTHON</span><span class="o">=</span>jupyter ./bin/pyspark</code></pre></figure>
 
-    <p>You can customize the <code>ipython</code> or <code>jupyter</code> commands by setting <code>PYSPARK_DRIVER_PYTHON_OPTS</code>.</p>
+    <p>You can customize the <code>ipython</code> or <code>jupyter</code> commands by setting <code>PYSPARK_DRIVER_PYTHON_OPTS</code>. </p>
 
     <p>After the Jupyter Notebook server is launched, you can create a new &#8220;Python 2&#8221; notebook from
 the &#8220;Files&#8221; tab. Inside the notebook, you can input the command <code>%pylab inline</code> as part of
@@ -415,8 +415,8 @@ shared filesystem, HDFS, HBase, or any data source offering a Hadoop InputFormat
 
     <p>Parallelized collections are created by calling <code>SparkContext</code>&#8217;s <code>parallelize</code> method on an existing collection in your driver program (a Scala <code>Seq</code>). The elements of the collection are copied to form a distributed dataset that can be operated on in parallel. For example, here is how to create a parallelized collection holding the numbers 1 to 5:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">data</span> <span class="k">=</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">3</span><span class="o">,</span> <span class="mi">4</span><span class="o">,</span> <span class="mi">5</span><span class="o">)</span>
-<span class="k">val</span> <span class="n">distData</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span><span class="n">data</span><span class="o">)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">data</span> <span class="k">=</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">3</span><span class="o">,</span> <span class="mi">4</span><span class="o">,</span> <span class="mi">5</span><span class="o">)</span>
+<span class="k">val</span> <span class="n">distData</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span><span class="n">data</span><span class="o">)</span></code></pre></figure>
 
     <p>Once created, the distributed dataset (<code>distData</code>) can be operated on in parallel. For example, we might call <code>distData.reduce((a, b) =&gt; a + b)</code> to add up the elements of the array. We describe operations on distributed datasets later on.</p>
 
@@ -426,8 +426,8 @@ shared filesystem, HDFS, HBase, or any data source offering a Hadoop InputFormat
 
     <p>Parallelized collections are created by calling <code>JavaSparkContext</code>&#8217;s <code>parallelize</code> method on an existing <code>Collection</code> in your driver program. The elements of the collection are copied to form a distributed dataset that can be operated on in parallel. For example, here is how to create a parallelized collection holding the numbers 1 to 5:</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">List</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">data</span> <span class="o">=</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">3</span><span class="o">,</span> <span class="mi">4</span><span class="o">,</span> <span class="mi">5</span><span class="o">);</span>
-<span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">distData</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">parallelize</span><span class="o">(</span><span class="n">data</span><span class="o">);</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">List</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">data</span> <span class="o">=</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">3</span><span class="o">,</span> <span class="mi">4</span><span class="o">,</span> <span class="mi">5</span><span class="o">);</span>
+<span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">distData</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">parallelize</span><span class="o">(</span><span class="n">data</span><span class="o">);</span></code></pre></figure>
 
     <p>Once created, the distributed dataset (<code>distData</code>) can be operated on in parallel. For example, we might call <code>distData.reduce((a, b) -&gt; a + b)</code> to add up the elements of the list.
 We describe operations on distributed datasets later on.</p>
@@ -443,8 +443,8 @@ We describe <a href="#passing-functions-to-spark">passing functions to Spark</a>
 
     <p>Parallelized collections are created by calling <code>SparkContext</code>&#8217;s <code>parallelize</code> method on an existing iterable or collection in your driver program. The elements of the collection are copied to form a distributed dataset that can be operated on in parallel. For example, here is how to create a parallelized collection holding the numbers 1 to 5:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">data</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>
-<span class="n">distData</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">(</span><span class="n">data</span><span class="p">)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">data</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">]</span>
+<span class="n">distData</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">(</span><span class="n">data</span><span class="p">)</span></code></pre></figure>
 
     <p>Once created, the distributed dataset (<code>distData</code>) can be operated on in parallel. For example, we can call <code>distData.reduce(lambda a, b: a + b)</code> to add up the elements of the list.
 We describe operations on distributed datasets later on.</p>
@@ -465,8 +465,8 @@ We describe operations on distributed datasets later on.</p>
 
     <p>Text file RDDs can be created using <code>SparkContext</code>&#8217;s <code>textFile</code> method. This method takes an URI for the file (either a local path on the machine, or a <code>hdfs://</code>, <code>s3n://</code>, etc URI) and reads it as a collection of lines. Here is an example invocation:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">scala</span><span class="o">&gt;</span> <span class="k">val</span> <span class="n">distFile</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;data.txt&quot;</span><span class="o">)</span>
-<span class="n">distFile</span><span class="k">:</span> <span class="kt">org.apache.spark.rdd.RDD</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="n">data</span><span class="o">.</span><span class="n">txt</span> <span class="nc">MapPartitionsRDD</span><span class="o">[</span><span class="err">10</span><span class="o">]</span> <span class="n">at</span> <span class="n">textFile</span> <span class="n">at</span> <span class="o">&lt;</span><span class="n">console</span><span class="k">&gt;:</span><span class="mi">26</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">scala</span><span class="o">&gt;</span> <span class="k">val</span> <span class="n">distFile</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;data.txt&quot;</span><span class="o">)</span>
+<span class="n">distFile</span><span class="k">:</span> <span class="kt">org.apache.spark.rdd.RDD</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="n">data</span><span class="o">.</span><span class="n">txt</span> <span class="nc">MapPartitionsRDD</span><span class="o">[</span><span class="err">10</span><span class="o">]</span> <span class="n">at</span> <span class="n">textFile</span> <span class="n">at</span> <span class="o">&lt;</span><span class="n">console</span><span class="k">&gt;:</span><span class="mi">26</span></code></pre></figure>
 
     <p>Once created, <code>distFile</code> can be acted on by dataset operations. For example, we can add up the sizes of all the lines using the <code>map</code> and <code>reduce</code> operations as follows: <code>distFile.map(s =&gt; s.length).reduce((a, b) =&gt; a + b)</code>.</p>
 
@@ -509,7 +509,7 @@ We describe operations on distributed datasets later on.</p>
 
     <p>Text file RDDs can be created using <code>SparkContext</code>&#8217;s <code>textFile</code> method. This method takes an URI for the file (either a local path on the machine, or a <code>hdfs://</code>, <code>s3n://</code>, etc URI) and reads it as a collection of lines. Here is an example invocation:</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">distFile</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">textFile</span><span class="o">(</span><span class="s">&quot;data.txt&quot;</span><span class="o">);</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">distFile</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">textFile</span><span class="o">(</span><span class="s">&quot;data.txt&quot;</span><span class="o">);</span></code></pre></figure>
 
     <p>Once created, <code>distFile</code> can be acted on by dataset operations. For example, we can add up the sizes of all the lines using the <code>map</code> and <code>reduce</code> operations as follows: <code>distFile.map(s -&gt; s.length()).reduce((a, b) -&gt; a + b)</code>.</p>
 
@@ -552,7 +552,7 @@ We describe operations on distributed datasets later on.</p>
 
     <p>Text file RDDs can be created using <code>SparkContext</code>&#8217;s <code>textFile</code> method. This method takes an URI for the file (either a local path on the machine, or a <code>hdfs://</code>, <code>s3n://</code>, etc URI) and reads it as a collection of lines. Here is an example invocation:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">&gt;&gt;&gt;</span> <span class="n">distFile</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data.txt&quot;</span><span class="p">)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="o">&gt;&gt;&gt;</span> <span class="n">distFile</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data.txt&quot;</span><span class="p">)</span></code></pre></figure>
 
     <p>Once created, <code>distFile</code> can be acted on by dataset operations. For example, we can add up the sizes of all the lines using the <code>map</code> and <code>reduce</code> operations as follows: <code>distFile.map(lambda s: len(s)).reduce(lambda a, b: a + b)</code>.</p>
 
@@ -615,10 +615,10 @@ Python <code>array.array</code> for arrays of primitive types, users need to spe
     <p>Similarly to text files, SequenceFiles can be saved and loaded by specifying the path. The key and value
 classes can be specified, but for standard Writables this is not required.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">&gt;&gt;&gt;</span> <span class="n">rdd</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="s">&quot;a&quot;</span> <span class="o">*</span> <span class="n">x</span><span class="p">))</span>
-<span class="o">&gt;&gt;&gt;</span> <span class="n">rdd</span><span class="o">.</span><span class="n">saveAsSequenceFile</span><span class="p">(</span><span class="s">&quot;path/to/file&quot;</span><span class="p">)</span>
-<span class="o">&gt;&gt;&gt;</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">sc</span><span class="o">.</span><span class="n">sequenceFile</span><span class="p">(</span><span class="s">&quot;path/to/file&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">collect</span><span class="p">())</span>
-<span class="p">[(</span><span class="mi">1</span><span class="p">,</span> <span class="s">u&#39;a&#39;</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s">u&#39;aa&#39;</span><span class="p">),</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s">u&#39;aaa&#39;</span><span class="p">)]</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="o">&gt;&gt;&gt;</span> <span class="n">rdd</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="s2">&quot;a&quot;</span> <span class="o">*</span> <span class="n">x</span><span class="p">))</span>
+<span class="o">&gt;&gt;&gt;</span> <span class="n">rdd</span><span class="o">.</span><span class="n">saveAsSequenceFile</span><span class="p">(</span><span class="s2">&quot;path/to/file&quot;</span><span class="p">)</span>
+<span class="o">&gt;&gt;&gt;</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">sc</span><span class="o">.</span><span class="n">sequenceFile</span><span class="p">(</span><span class="s2">&quot;path/to/file&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">collect</span><span class="p">())</span>
+<span class="p">[(</span><span class="mi">1</span><span class="p">,</span> <span class="sa">u</span><span class="s1">&#39;a&#39;</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="sa">u</span><span class="s1">&#39;aa&#39;</span><span class="p">),</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="sa">u</span><span class="s1">&#39;aaa&#39;</span><span class="p">)]</span></code></pre></figure>
 
     <p><strong>Saving and Loading Other Hadoop Input/Output Formats</strong></p>
 
@@ -626,17 +626,17 @@ classes can be specified, but for standard Writables this is not required.</p>
 If required, a Hadoop configuration can be passed in as a Python dict. Here is an example using the
 Elasticsearch ESInputFormat:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="err">$</span> <span class="n">SPARK_CLASSPATH</span><span class="o">=/</span><span class="n">path</span><span class="o">/</span><span class="n">to</span><span class="o">/</span><span class="n">elasticsearch</span><span class="o">-</span><span class="n">hadoop</span><span class="o">.</span><span class="n">jar</span> <span class="o">./</span><span class="nb">bin</span><span class="o">/</span><span class="n">pyspark</span>
-<span class="o">&gt;&gt;&gt;</span> <span class="n">conf</span> <span class="o">=</span> <span class="p">{</span><span class="s">&quot;es.resource&quot;</span> <span class="p">:</span> <span class="s">&quot;index/type&quot;</span><span class="p">}</span>  <span class="c"># assume Elasticsearch is running on localhost defaults</span>
-<span class="o">&gt;&gt;&gt;</span> <span class="n">rdd</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">newAPIHadoopRDD</span><span class="p">(</span><span class="s">&quot;org.elasticsearch.hadoop.mr.EsInputFormat&quot;</span><span class="p">,</span>
-                             <span class="s">&quot;org.apache.hadoop.io.NullWritable&quot;</span><span class="p">,</span>
-                             <span class="s">&quot;org.elasticsearch.hadoop.mr.LinkedMapWritable&quot;</span><span class="p">,</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="err">$</span> <span class="n">SPARK_CLASSPATH</span><span class="o">=/</span><span class="n">path</span><span class="o">/</span><span class="n">to</span><span class="o">/</span><span class="n">elasticsearch</span><span class="o">-</span><span class="n">hadoop</span><span class="o">.</span><span class="n">jar</span> <span class="o">./</span><span class="nb">bin</span><span class="o">/</span><span class="n">pyspark</span>
+<span class="o">&gt;&gt;&gt;</span> <span class="n">conf</span> <span class="o">=</span> <span class="p">{</span><span class="s2">&quot;es.resource&quot;</span> <span class="p">:</span> <span class="s2">&quot;index/type&quot;</span><span class="p">}</span>  <span class="c1"># assume Elasticsearch is running on localhost defaults</span>
+<span class="o">&gt;&gt;&gt;</span> <span class="n">rdd</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">newAPIHadoopRDD</span><span class="p">(</span><span class="s2">&quot;org.elasticsearch.hadoop.mr.EsInputFormat&quot;</span><span class="p">,</span>
+                             <span class="s2">&quot;org.apache.hadoop.io.NullWritable&quot;</span><span class="p">,</span>
+                             <span class="s2">&quot;org.elasticsearch.hadoop.mr.LinkedMapWritable&quot;</span><span class="p">,</span>
                              <span class="n">conf</span><span class="o">=</span><span class="n">conf</span><span class="p">)</span>
-<span class="o">&gt;&gt;&gt;</span> <span class="n">rdd</span><span class="o">.</span><span class="n">first</span><span class="p">()</span>  <span class="c"># the result is a MapWritable that is converted to a Python dict</span>
-<span class="p">(</span><span class="s">u&#39;Elasticsearch ID&#39;</span><span class="p">,</span>
- <span class="p">{</span><span class="s">u&#39;field1&#39;</span><span class="p">:</span> <span class="bp">True</span><span class="p">,</span>
-  <span class="s">u&#39;field2&#39;</span><span class="p">:</span> <span class="s">u&#39;Some Text&#39;</span><span class="p">,</span>
-  <span class="s">u&#39;field3&#39;</span><span class="p">:</span> <span class="mi">12345</span><span class="p">})</span></code></pre></div>
+<span class="o">&gt;&gt;&gt;</span> <span class="n">rdd</span><span class="o">.</span><span class="n">first</span><span class="p">()</span>  <span class="c1"># the result is a MapWritable that is converted to a Python dict</span>
+<span class="p">(</span><span class="sa">u</span><span class="s1">&#39;Elasticsearch ID&#39;</span><span class="p">,</span>
+ <span class="p">{</span><span class="sa">u</span><span class="s1">&#39;field1&#39;</span><span class="p">:</span> <span class="bp">True</span><span class="p">,</span>
+  <span class="sa">u</span><span class="s1">&#39;field2&#39;</span><span class="p">:</span> <span class="sa">u</span><span class="s1">&#39;Some Text&#39;</span><span class="p">,</span>
+  <span class="sa">u</span><span class="s1">&#39;field3&#39;</span><span class="p">:</span> <span class="mi">12345</span><span class="p">})</span></code></pre></figure>
 
     <p>Note that, if the InputFormat simply depends on a Hadoop configuration and/or input path, and
 the key and value classes can easily be converted according to the above table,
@@ -672,9 +672,9 @@ for examples of using Cassandra / HBase <code>InputFormat</code> and <code>Outpu
 
     <p>To illustrate RDD basics, consider the simple program below:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">lines</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;data.txt&quot;</span><span class="o">)</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">lines</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;data.txt&quot;</span><span class="o">)</span>
 <span class="k">val</span> <span class="n">lineLengths</span> <span class="k">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">s</span> <span class="k">=&gt;</span> <span class="n">s</span><span class="o">.</span><span class="n">length</span><span class="o">)</span>
-<span class="k">val</span> <span class="n">totalLength</span> <span class="k">=</span> <span class="n">lineLengths</span><span class="o">.</span><span class="n">reduce</span><span class="o">((</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">totalLength</span> <span class="k">=</span> <span class="n">lineLengths</span><span class="o">.</span><span class="n">reduce</span><span class="o">((</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">)</span></code></pre></figure>
 
     <p>The first line defines a base RDD from an external file. This dataset is not loaded in memory or
 otherwise acted on: <code>lines</code> is merely a pointer to the file.
@@ -686,7 +686,7 @@ returning only its answer to the driver program.</p>
 
     <p>If we also wanted to use <code>lineLengths</code> again later, we could add:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">lineLengths</span><span class="o">.</span><span class="n">persist</span><span class="o">()</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">lineLengths</span><span class="o">.</span><span class="n">persist</span><span class="o">()</span></code></pre></figure>
 
     <p>before the <code>reduce</code>, which would cause <code>lineLengths</code> to be saved in memory after the first time it is computed.</p>
 
@@ -696,9 +696,9 @@ returning only its answer to the driver program.</p>
 
     <p>To illustrate RDD basics, consider the simple program below:</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">textFile</span><span class="o">(</span><span class="s">&quot;data.txt&quot;</span><span class="o">);</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">textFile</span><span class="o">(</span><span class="s">&quot;data.txt&quot;</span><span class="o">);</span>
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">lineLengths</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="n">s</span> <span class="o">-&gt;</span> <span class="n">s</span><span class="o">.</span><span class="na">length</span><span class="o">());</span>
-<span class="kt">int</span> <span class="n">totalLength</span> <span class="o">=</span> <span class="n">lineLengths</span><span class="o">.</span><span class="na">reduce</span><span class="o">((</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">);</span></code></pre></div>
+<span class="kt">int</span> <span class="n">totalLength</span> <span class="o">=</span> <span class="n">lineLengths</span><span class="o">.</span><span class="na">reduce</span><span class="o">((</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">);</span></code></pre></figure>
 
     <p>The first line defines a base RDD from an external file. This dataset is not loaded in memory or
 otherwise acted on: <code>lines</code> is merely a pointer to the file.
@@ -710,7 +710,7 @@ returning only its answer to the driver program.</p>
 
     <p>If we also wanted to use <code>lineLengths</code> again later, we could add:</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">lineLengths</span><span class="o">.</span><span class="na">persist</span><span class="o">(</span><span class="n">StorageLevel</span><span class="o">.</span><span class="na">MEMORY_ONLY</span><span class="o">());</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">lineLengths</span><span class="o">.</span><span class="na">persist</span><span class="o">(</span><span class="n">StorageLevel</span><span class="o">.</span><span class="na">MEMORY_ONLY</span><span class="o">());</span></code></pre></figure>
 
     <p>before the <code>reduce</code>, which would cause <code>lineLengths</code> to be saved in memory after the first time it is computed.</p>
 
@@ -720,9 +720,9 @@ returning only its answer to the driver program.</p>
 
     <p>To illustrate RDD basics, consider the simple program below:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data.txt&quot;</span><span class="p">)</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data.txt&quot;</span><span class="p">)</span>
 <span class="n">lineLengths</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">))</span>
-<span class="n">totalLength</span> <span class="o">=</span> <span class="n">lineLengths</span><span class="o">.</span><span class="n">reduce</span><span class="p">(</span><span class="k">lambda</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="p">)</span></code></pre></div>
+<span class="n">totalLength</span> <span class="o">=</span> <span class="n">lineLengths</span><span class="o">.</span><span class="n">reduce</span><span class="p">(</span><span class="k">lambda</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="p">)</span></code></pre></figure>
 
     <p>The first line defines a base RDD from an external file. This dataset is not loaded in memory or
 otherwise acted on: <code>lines</code> is merely a pointer to the file.
@@ -734,7 +734,7 @@ returning only its answer to the driver program.</p>
 
     <p>If we also wanted to use <code>lineLengths</code> again later, we could add:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">lineLengths</span><span class="o">.</span><span class="n">persist</span><span class="p">()</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">lineLengths</span><span class="o">.</span><span class="n">persist</span><span class="p">()</span></code></pre></figure>
 
     <p>before the <code>reduce</code>, which would cause <code>lineLengths</code> to be saved in memory after the first time it is computed.</p>
 
@@ -758,20 +758,20 @@ which can be used for short pieces of code.</li>
 pass <code>MyFunctions.func1</code>, as follows:</li>
     </ul>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">object</span> <span class="nc">MyFunctions</span> <span class="o">{</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">object</span> <span class="nc">MyFunctions</span> <span class="o">{</span>
   <span class="k">def</span> <span class="n">func1</span><span class="o">(</span><span class="n">s</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="o">{</span> <span class="o">...</span> <span class="o">}</span>
 <span class="o">}</span>
 
-<span class="n">myRdd</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="nc">MyFunctions</span><span class="o">.</span><span class="n">func1</span><span class="o">)</span></code></pre></div>
+<span class="n">myRdd</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="nc">MyFunctions</span><span class="o">.</span><span class="n">func1</span><span class="o">)</span></code></pre></figure>
 
     <p>Note that while it is also possible to pass a reference to a method in a class instance (as opposed to
 a singleton object), this requires sending the object that contains that class along with the method.
 For example, consider:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">MyClass</span> <span class="o">{</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">class</span> <span class="nc">MyClass</span> <span class="o">{</span>
   <span class="k">def</span> <span class="n">func1</span><span class="o">(</span><span class="n">s</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="o">{</span> <span class="o">...</span> <span class="o">}</span>
   <span class="k">def</span> <span class="n">doStuff</span><span class="o">(</span><span class="n">rdd</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">String</span><span class="o">])</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span> <span class="n">rdd</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">func1</span><span class="o">)</span> <span class="o">}</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
     <p>Here, if we create a new <code>MyClass</code> instance and call <code>doStuff</code> on it, the <code>map</code> inside there references the
 <code>func1</code> method <em>of that <code>MyClass</code> instance</em>, so the whole object needs to be sent to the cluster. It is
@@ -779,18 +779,18 @@ similar to writing <code>rdd.map(x =&gt; this.func1(x))</code>.</p>
 
     <p>In a similar way, accessing fields of the outer object will reference the whole object:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">MyClass</span> <span class="o">{</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">class</span> <span class="nc">MyClass</span> <span class="o">{</span>
   <span class="k">val</span> <span class="n">field</span> <span class="k">=</span> <span class="s">&quot;Hello&quot;</span>
   <span class="k">def</span> <span class="n">doStuff</span><span class="o">(</span><span class="n">rdd</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">String</span><span class="o">])</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span> <span class="n">rdd</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">x</span> <span class="k">=&gt;</span> <span class="n">field</span> <span class="o">+</span> <span class="n">x</span><span class="o">)</span> <span class="o">}</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
     <p>is equivalent to writing <code>rdd.map(x =&gt; this.field + x)</code>, which references all of <code>this</code>. To avoid this
 issue, the simplest way is to copy <code>field</code> into a local variable instead of accessing it externally:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">def</span> <span class="n">doStuff</span><span class="o">(</span><span class="n">rdd</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">String</span><span class="o">])</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">def</span> <span class="n">doStuff</span><span class="o">(</span><span class="n">rdd</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">String</span><span class="o">])</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
   <span class="k">val</span> <span class="n">field_</span> <span class="k">=</span> <span class="k">this</span><span class="o">.</span><span class="n">field</span>
   <span class="n">rdd</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">x</span> <span class="k">=&gt;</span> <span class="n">field_</span> <span class="o">+</span> <span class="n">x</span><span class="o">)</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
   </div>
 
@@ -811,17 +811,17 @@ to concisely define an implementation.</li>
     <p>While much of this guide uses lambda syntax for conciseness, it is easy to use all the same APIs
 in long-form. For example, we could have written our code above as follows:</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">textFile</span><span class="o">(</span><span class="s">&quot;data.txt&quot;</span><span class="o">);</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">textFile</span><span class="o">(</span><span class="s">&quot;data.txt&quot;</span><span class="o">);</span>
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">lineLengths</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="k">new</span> <span class="n">Function</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;()</span> <span class="o">{</span>
   <span class="kd">public</span> <span class="n">Integer</span> <span class="nf">call</span><span class="o">(</span><span class="n">String</span> <span class="n">s</span><span class="o">)</span> <span class="o">{</span> <span class="k">return</span> <span class="n">s</span><span class="o">.</span><span class="na">length</span><span class="o">();</span> <span class="o">}</span>
 <span class="o">});</span>
 <span class="kt">int</span> <span class="n">totalLength</span> <span class="o">=</span> <span class="n">lineLengths</span><span class="o">.</span><span class="na">reduce</span><span class="o">(</span><span class="k">new</span> <span class="n">Function2</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">,</span> <span class="n">Integer</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;()</span> <span class="o">{</span>
   <span class="kd">public</span> <span class="n">Integer</span> <span class="nf">call</span><span class="o">(</span><span class="n">Integer</span> <span class="n">a</span><span class="o">,</span> <span class="n">Integer</span> <span class="n">b</span><span class="o">)</span> <span class="o">{</span> <span class="k">return</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">;</span> <span class="o">}</span>
-<span class="o">});</span></code></pre></div>
+<span class="o">});</span></code></pre></figure>
 
     <p>Or, if writing the functions inline is unwieldy:</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">class</span> <span class="nc">GetLength</span> <span class="kd">implements</span> <span class="n">Function</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="o">{</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kd">class</span> <span class="nc">GetLength</span> <span class="kd">implements</span> <span class="n">Function</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="o">{</span>
   <span class="kd">public</span> <span class="n">Integer</span> <span class="nf">call</span><span class="o">(</span><span class="n">String</span> <span class="n">s</span><span class="o">)</span> <span class="o">{</span> <span class="k">return</span> <span class="n">s</span><span class="o">.</span><span class="na">length</span><span class="o">();</span> <span class="o">}</span>
 <span class="o">}</span>
 <span class="kd">class</span> <span class="nc">Sum</span> <span class="kd">implements</span> <span class="n">Function2</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">,</span> <span class="n">Integer</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="o">{</span>
@@ -829,8 +829,8 @@ in long-form. For example, we could have written our code above as follows:</p>
 <span class="o">}</span>
 
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">textFile</span><span class="o">(</span><span class="s">&quot;data.txt&quot;</span><span class="o">);</span>
-<span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">lineLengths</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="k">new</span> <span class="nf">GetLength</span><span class="o">());</span>
-<span class="kt">int</span> <span class="n">totalLength</span> <span class="o">=</span> <span class="n">lineLengths</span><span class="o">.</span><span class="na">reduce</span><span class="o">(</span><span class="k">new</span> <span class="nf">Sum</span><span class="o">());</span></code></pre></div>
+<span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">lineLengths</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="na">map</span><span class="o">(</span><span class="k">new</span> <span class="n">GetLength</span><span class="o">());</span>
+<span class="kt">int</span> <span class="n">totalLength</span> <span class="o">=</span> <span class="n">lineLengths</span><span class="o">.</span><span class="na">reduce</span><span class="o">(</span><span class="k">new</span> <span class="n">Sum</span><span class="o">());</span></code></pre></figure>
 
     <p>Note that anonymous inner classes in Java can also access variables in the enclosing scope as long
 as they are marked <code>final</code>. Spark will ship copies of these variables to each worker node as it does
@@ -854,42 +854,42 @@ functions or statements that do not return a value.)</li>
     <p>For example, to pass a longer function than can be supported using a <code>lambda</code>, consider
 the code below:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="sd">&quot;&quot;&quot;MyScript.py&quot;&quot;&quot;</span>
-<span class="k">if</span> <span class="n">__name__</span> <span class="o">==</span> <span class="s">&quot;__main__&quot;</span><span class="p">:</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="sd">&quot;&quot;&quot;MyScript.py&quot;&quot;&quot;</span>
+<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&quot;__main__&quot;</span><span class="p">:</span>
     <span class="k">def</span> <span class="nf">myFunc</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
-        <span class="n">words</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">)</span>
+        <span class="n">words</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">)</span>
         <span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="n">words</span><span class="p">)</span>
 
     <span class="n">sc</span> <span class="o">=</span> <span class="n">SparkContext</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
-    <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;file.txt&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">myFunc</span><span class="p">)</span></code></pre></div>
+    <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;file.txt&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">myFunc</span><span class="p">)</span></code></pre></figure>
 
     <p>Note that while it is also possible to pass a reference to a method in a class instance (as opposed to
 a singleton object), this requires sending the object that contains that class along with the method.
 For example, consider:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">class</span> <span class="nc">MyClass</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="k">class</span> <span class="nc">MyClass</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
     <span class="k">def</span> <span class="nf">func</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">s</span><span class="p">):</span>
         <span class="k">return</span> <span class="n">s</span>
     <span class="k">def</span> <span class="nf">doStuff</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">rdd</span><span class="p">):</span>
-        <span class="k">return</span> <span class="n">rdd</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">func</span><span class="p">)</span></code></pre></div>
+        <span class="k">return</span> <span class="n">rdd</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">func</span><span class="p">)</span></code></pre></figure>
 
     <p>Here, if we create a <code>new MyClass</code> and call <code>doStuff</code> on it, the <code>map</code> inside there references the
 <code>func</code> method <em>of that <code>MyClass</code> instance</em>, so the whole object needs to be sent to the cluster.</p>
 
     <p>In a similar way, accessing fields of the outer object will reference the whole object:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">class</span> <span class="nc">MyClass</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
-    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
-        <span class="bp">self</span><span class="o">.</span><span class="n">field</span> <span class="o">=</span> <span class="s">&quot;Hello&quot;</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="k">class</span> <span class="nc">MyClass</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
+    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
+        <span class="bp">self</span><span class="o">.</span><span class="n">field</span> <span class="o">=</span> <span class="s2">&quot;Hello&quot;</span>
     <span class="k">def</span> <span class="nf">doStuff</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">rdd</span><span class="p">):</span>
-        <span class="k">return</span> <span class="n">rdd</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">field</span> <span class="o">+</span> <span class="n">s</span><span class="p">)</span></code></pre></div>
+        <span class="k">return</span> <span class="n">rdd</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">field</span> <span class="o">+</span> <span class="n">s</span><span class="p">)</span></code></pre></figure>
 
     <p>To avoid this issue, the simplest way is to copy <code>field</code> into a local variable instead
 of accessing it externally:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">doStuff</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">rdd</span><span class="p">):</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="k">def</span> <span class="nf">doStuff</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">rdd</span><span class="p">):</span>
     <span class="n">field</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">field</span>
-    <span class="k">return</span> <span class="n">rdd</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="n">field</span> <span class="o">+</span> <span class="n">s</span><span class="p">)</span></code></pre></div>
+    <span class="k">return</span> <span class="n">rdd</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="n">field</span> <span class="o">+</span> <span class="n">s</span><span class="p">)</span></code></pre></figure>
 
   </div>
 
@@ -906,40 +906,40 @@ of accessing it externally:</p>
 
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">var</span> <span class="n">counter</span> <span class="k">=</span> <span class="mi">0</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">var</span> <span class="n">counter</span> <span class="k">=</span> <span class="mi">0</span>
 <span class="k">var</span> <span class="n">rdd</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span><span class="n">data</span><span class="o">)</span>
 
 <span class="c1">// Wrong: Don&#39;t do this!!</span>
 <span class="n">rdd</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">x</span> <span class="k">=&gt;</span> <span class="n">counter</span> <span class="o">+=</span> <span class="n">x</span><span class="o">)</span>
 
-<span class="n">println</span><span class="o">(</span><span class="s">&quot;Counter value: &quot;</span> <span class="o">+</span> <span class="n">counter</span><span class="o">)</span></code></pre></div>
+<span class="n">println</span><span class="o">(</span><span class="s">&quot;Counter value: &quot;</span> <span class="o">+</span> <span class="n">counter</span><span class="o">)</span></code></pre></figure>
 
   </div>
 
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kt">int</span> <span class="n">counter</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kt">int</span> <span class="n">counter</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span>
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">rdd</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">parallelize</span><span class="o">(</span><span class="n">data</span><span class="o">);</span>
 
 <span class="c1">// Wrong: Don&#39;t do this!!</span>
 <span class="n">rdd</span><span class="o">.</span><span class="na">foreach</span><span class="o">(</span><span class="n">x</span> <span class="o">-&gt;</span> <span class="n">counter</span> <span class="o">+=</span> <span class="n">x</span><span class="o">);</span>
 
-<span class="n">println</span><span class="o">(</span><span class="s">&quot;Counter value: &quot;</span> <span class="o">+</span> <span class="n">counter</span><span class="o">);</span></code></pre></div>
+<span class="n">println</span><span class="o">(</span><span class="s">&quot;Counter value: &quot;</span> <span class="o">+</span> <span class="n">counter</span><span class="o">);</span></code></pre></figure>
 
   </div>
 
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">counter</span> <span class="o">=</span> <span class="mi">0</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">counter</span> <span class="o">=</span> <span class="mi">0</span>
 <span class="n">rdd</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
 
-<span class="c"># Wrong: Don&#39;t do this!!</span>
+<span class="c1"># Wrong: Don&#39;t do this!!</span>
 <span class="k">def</span> <span class="nf">increment_counter</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
     <span class="k">global</span> <span class="n">counter</span>
     <span class="n">counter</span> <span class="o">+=</span> <span class="n">x</span>
 <span class="n">rdd</span><span class="o">.</span><span class="n">foreach</span><span class="p">(</span><span class="n">increment_counter</span><span class="p">)</span>
 
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Counter value: &quot;</span><span class="p">,</span> <span class="n">counter</span><span class="p">)</span></code></pre></div>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Counter value: &quot;</span><span class="p">,</span> <span class="n">counter</span><span class="p">)</span></code></pre></figure>
 
   </div>
 
@@ -953,7 +953,7 @@ of accessing it externally:</p>
 
 <p>In local mode, in some circumstances the <code>foreach</code> function will actually execute within the same JVM as the driver and will reference the same original <strong>counter</strong>, and may actually update it.</p>
 
-<p>To ensure well-defined behavior in these sorts of scenarios one should use an <a href="#accumulators"><code>Accumulator</code></a>. Accumulators in Spark are used specifically to provide a mechanism for safely updating a variable when execution is split up across worker nodes in a cluster. The Accumulators section of this guide discusses these in more detail.</p>
+<p>To ensure well-defined behavior in these sorts of scenarios one should use an <a href="#accumulators"><code>Accumulator</code></a>. Accumulators in Spark are used specifically to provide a mechanism for safely updating a variable when execution is split up across worker nodes in a cluster. The Accumulators section of this guide discusses these in more detail.  </p>
 
 <p>In general, closures - constructs like loops or locally defined methods, should not be used to mutate some global state. Spark does not define or guarantee the behavior of mutations to objects referenced from outside of closures. Some code that does this may work in local mode, but that&#8217;s just by accident and such code will not behave as expected in distributed mode. Use an Accumulator instead if some global aggregation is needed.</p>
 
@@ -980,9 +980,9 @@ which automatically wraps around an RDD of tuples.</p>
     <p>For example, the following code uses the <code>reduceByKey</code> operation on key-value pairs to count how
 many times each line of text occurs in a file:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">lines</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;data.txt&quot;</span><span class="o">)</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">lines</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;data.txt&quot;</span><span class="o">)</span>
 <span class="k">val</span> <span class="n">pairs</span> <span class="k">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">s</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">s</span><span class="o">,</span> <span class="mi">1</span><span class="o">))</span>
-<span class="k">val</span> <span class="n">counts</span> <span class="k">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">reduceByKey</span><span class="o">((</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">counts</span> <span class="k">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">reduceByKey</span><span class="o">((</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">)</span></code></pre></figure>
 
     <p>We could also use <code>counts.sortByKey()</code>, for example, to sort the pairs alphabetically, and finally
 <code>counts.collect()</code> to bring them back to the driver program as an array of objects.</p>
@@ -1015,9 +1015,9 @@ key-value ones.</p>
     <p>For example, the following code uses the <code>reduceByKey</code> operation on key-value pairs to count how
 many times each line of text occurs in a file:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="nc">JavaRDD</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">&gt;</span> <span class="n">lines</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;data.txt&quot;</span><span class="o">);</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="nc">JavaRDD</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">&gt;</span> <span class="n">lines</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;data.txt&quot;</span><span class="o">);</span>
 <span class="nc">JavaPairRDD</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="nc">Integer</span><span class="o">&gt;</span> <span class="n">pairs</span> <span class="k">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">mapToPair</span><span class="o">(</span><span class="n">s</span> <span class="o">-&gt;</span> <span class="k">new</span> <span class="nc">Tuple2</span><span class="o">(</span><span class="n">s</span><span class="o">,</span> <span class="mi">1</span><span class="o">));</span>
-<span class="nc">JavaPairRDD</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="nc">Integer</span><span class="o">&gt;</span> <span class="n">counts</span> <span class="k">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">reduceByKey</span><span class="o">((</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">);</span></code></pre></div>
+<span class="nc">JavaPairRDD</span><span class="o">&lt;</span><span class="nc">String</span><span class="o">,</span> <span class="nc">Integer</span><span class="o">&gt;</span> <span class="n">counts</span> <span class="k">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">reduceByKey</span><span class="o">((</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="o">-&gt;</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">);</span></code></pre></figure>
 
     <p>We could also use <code>counts.sortByKey()</code>, for example, to sort the pairs alphabetically, and finally
 <code>counts.collect()</code> to bring them back to the driver program as an array of objects.</p>
@@ -1042,9 +1042,9 @@ Simply create such tuples and then call your desired operation.</p>
     <p>For example, the following code uses the <code>reduceByKey</code> operation on key-value pairs to count how
 many times each line of text occurs in a file:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data.txt&quot;</span><span class="p">)</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data.txt&quot;</span><span class="p">)</span>
 <span class="n">pairs</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
-<span class="n">counts</span> <span class="o">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">reduceByKey</span><span class="p">(</span><span class="k">lambda</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="p">)</span></code></pre></div>
+<span class="n">counts</span> <span class="o">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">reduceByKey</span><span class="p">(</span><span class="k">lambda</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="p">)</span></code></pre></figure>
 
     <p>We could also use <code>counts.sortByKey()</code>, for example, to sort the pairs alphabetically, and finally
 <code>counts.collect()</code> to bring them back to the driver program as a list of objects.</p>
@@ -1435,30 +1435,30 @@ method. The code below shows this:</p>
 
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">scala</span><span class="o">&gt;</span> <span class="k">val</span> <span class="n">broadcastVar</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">broadcast</span><span class="o">(</span><span class="nc">Array</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">3</span><span class="o">))</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">scala</span><span class="o">&gt;</span> <span class="k">val</span> <span class="n">broadcastVar</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">broadcast</span><span class="o">(</span><span class="nc">Array</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">3</span><span class="o">))</span>
 <span class="n">broadcastVar</span><span class="k">:</span> <span class="kt">org.apache.spark.broadcast.Broadcast</span><span class="o">[</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Int</span><span class="o">]]</span> <span class="k">=</span> <span class="nc">Broadcast</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span>
 
 <span class="n">scala</span><span class="o">&gt;</span> <span class="n">broadcastVar</span><span class="o">.</span><span class="n">value</span>
-<span class="n">res0</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">3</span><span class="o">)</span></code></pre></div>
+<span class="n">res0</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="mi">3</span><span class="o">)</span></code></pre></figure>
 
   </div>
 
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">Broadcast</span><span class="o">&lt;</span><span class="kt">int</span><span class="o">[]&gt;</span> <span class="n">broadcastVar</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">broadcast</span><span class="o">(</span><span class="k">new<

<TRUNCATED>

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[23/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/hadoop-provided.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/hadoop-provided.html b/site/docs/2.1.0/hadoop-provided.html
index ff7afb7..9d77cf0 100644
--- a/site/docs/2.1.0/hadoop-provided.html
+++ b/site/docs/2.1.0/hadoop-provided.html
@@ -133,16 +133,16 @@
 <h1 id="apache-hadoop">Apache Hadoop</h1>
 <p>For Apache distributions, you can use Hadoop&#8217;s &#8216;classpath&#8217; command. For instance:</p>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c">### in conf/spark-env.sh ###</span>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="c1">### in conf/spark-env.sh ###</span>
 
-<span class="c"># If &#39;hadoop&#39; binary is on your PATH</span>
-<span class="nb">export </span><span class="nv">SPARK_DIST_CLASSPATH</span><span class="o">=</span><span class="k">$(</span>hadoop classpath<span class="k">)</span>
+<span class="c1"># If &#39;hadoop&#39; binary is on your PATH</span>
+<span class="nb">export</span> <span class="nv">SPARK_DIST_CLASSPATH</span><span class="o">=</span><span class="k">$(</span>hadoop classpath<span class="k">)</span>
 
-<span class="c"># With explicit path to &#39;hadoop&#39; binary</span>
-<span class="nb">export </span><span class="nv">SPARK_DIST_CLASSPATH</span><span class="o">=</span><span class="k">$(</span>/path/to/hadoop/bin/hadoop classpath<span class="k">)</span>
+<span class="c1"># With explicit path to &#39;hadoop&#39; binary</span>
+<span class="nb">export</span> <span class="nv">SPARK_DIST_CLASSPATH</span><span class="o">=</span><span class="k">$(</span>/path/to/hadoop/bin/hadoop classpath<span class="k">)</span>
 
-<span class="c"># Passing a Hadoop configuration directory</span>
-<span class="nb">export </span><span class="nv">SPARK_DIST_CLASSPATH</span><span class="o">=</span><span class="k">$(</span>hadoop --config /path/to/configs classpath<span class="k">)</span></code></pre></div>
+<span class="c1"># Passing a Hadoop configuration directory</span>
+<span class="nb">export</span> <span class="nv">SPARK_DIST_CLASSPATH</span><span class="o">=</span><span class="k">$(</span>hadoop --config /path/to/configs classpath<span class="k">)</span></code></pre></figure>
 
 
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/img/structured-streaming-watermark.png
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/img/structured-streaming-watermark.png b/site/docs/2.1.0/img/structured-streaming-watermark.png
new file mode 100644
index 0000000..f21fbda
Binary files /dev/null and b/site/docs/2.1.0/img/structured-streaming-watermark.png differ

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/img/structured-streaming.pptx
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/img/structured-streaming.pptx b/site/docs/2.1.0/img/structured-streaming.pptx
index 6aad2ed..f5bdfc0 100644
Binary files a/site/docs/2.1.0/img/structured-streaming.pptx and b/site/docs/2.1.0/img/structured-streaming.pptx differ

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/job-scheduling.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/job-scheduling.html b/site/docs/2.1.0/job-scheduling.html
index 53161c2..9651607 100644
--- a/site/docs/2.1.0/job-scheduling.html
+++ b/site/docs/2.1.0/job-scheduling.html
@@ -127,24 +127,24 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#overview" id="markdown-toc-overview">Overview</a></li>
-  <li><a href="#scheduling-across-applications" id="markdown-toc-scheduling-across-applications">Scheduling Across Applications</a>    <ul>
-      <li><a href="#dynamic-resource-allocation" id="markdown-toc-dynamic-resource-allocation">Dynamic Resource Allocation</a>        <ul>
-          <li><a href="#configuration-and-setup" id="markdown-toc-configuration-and-setup">Configuration and Setup</a></li>
-          <li><a href="#resource-allocation-policy" id="markdown-toc-resource-allocation-policy">Resource Allocation Policy</a>            <ul>
-              <li><a href="#request-policy" id="markdown-toc-request-policy">Request Policy</a></li>
-              <li><a href="#remove-policy" id="markdown-toc-remove-policy">Remove Policy</a></li>
+  <li><a href="#overview">Overview</a></li>
+  <li><a href="#scheduling-across-applications">Scheduling Across Applications</a>    <ul>
+      <li><a href="#dynamic-resource-allocation">Dynamic Resource Allocation</a>        <ul>
+          <li><a href="#configuration-and-setup">Configuration and Setup</a></li>
+          <li><a href="#resource-allocation-policy">Resource Allocation Policy</a>            <ul>
+              <li><a href="#request-policy">Request Policy</a></li>
+              <li><a href="#remove-policy">Remove Policy</a></li>
             </ul>
           </li>
-          <li><a href="#graceful-decommission-of-executors" id="markdown-toc-graceful-decommission-of-executors">Graceful Decommission of Executors</a></li>
+          <li><a href="#graceful-decommission-of-executors">Graceful Decommission of Executors</a></li>
         </ul>
       </li>
     </ul>
   </li>
-  <li><a href="#scheduling-within-an-application" id="markdown-toc-scheduling-within-an-application">Scheduling Within an Application</a>    <ul>
-      <li><a href="#fair-scheduler-pools" id="markdown-toc-fair-scheduler-pools">Fair Scheduler Pools</a></li>
-      <li><a href="#default-behavior-of-pools" id="markdown-toc-default-behavior-of-pools">Default Behavior of Pools</a></li>
-      <li><a href="#configuring-pool-properties" id="markdown-toc-configuring-pool-properties">Configuring Pool Properties</a></li>
+  <li><a href="#scheduling-within-an-application">Scheduling Within an Application</a>    <ul>
+      <li><a href="#fair-scheduler-pools">Fair Scheduler Pools</a></li>
+      <li><a href="#default-behavior-of-pools">Default Behavior of Pools</a></li>
+      <li><a href="#configuring-pool-properties">Configuring Pool Properties</a></li>
     </ul>
   </li>
 </ul>
@@ -321,9 +321,9 @@ mode is best for multi-user settings.</p>
 <p>To enable the fair scheduler, simply set the <code>spark.scheduler.mode</code> property to <code>FAIR</code> when configuring
 a SparkContext:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">().</span><span class="n">setMaster</span><span class="o">(...).</span><span class="n">setAppName</span><span class="o">(...)</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">().</span><span class="n">setMaster</span><span class="o">(...).</span><span class="n">setAppName</span><span class="o">(...)</span>
 <span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="o">(</span><span class="s">&quot;spark.scheduler.mode&quot;</span><span class="o">,</span> <span class="s">&quot;FAIR&quot;</span><span class="o">)</span>
-<span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></figure>
 
 <h2 id="fair-scheduler-pools">Fair Scheduler Pools</h2>
 
@@ -337,15 +337,15 @@ many concurrent jobs they have instead of giving <em>jobs</em> equal shares. Thi
 adding the <code>spark.scheduler.pool</code> &#8220;local property&#8221; to the SparkContext in the thread that&#8217;s submitting them.
 This is done as follows:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Assuming sc is your SparkContext variable</span>
-<span class="n">sc</span><span class="o">.</span><span class="n">setLocalProperty</span><span class="o">(</span><span class="s">&quot;spark.scheduler.pool&quot;</span><span class="o">,</span> <span class="s">&quot;pool1&quot;</span><span class="o">)</span></code></pre></div>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// Assuming sc is your SparkContext variable</span>
+<span class="n">sc</span><span class="o">.</span><span class="n">setLocalProperty</span><span class="o">(</span><span class="s">&quot;spark.scheduler.pool&quot;</span><span class="o">,</span> <span class="s">&quot;pool1&quot;</span><span class="o">)</span></code></pre></figure>
 
 <p>After setting this local property, <em>all</em> jobs submitted within this thread (by calls in this thread
 to <code>RDD.save</code>, <code>count</code>, <code>collect</code>, etc) will use this pool name. The setting is per-thread to make
 it easy to have a thread run multiple jobs on behalf of the same user. If you&#8217;d like to clear the
 pool that a thread is associated with, simply call:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">sc</span><span class="o">.</span><span class="n">setLocalProperty</span><span class="o">(</span><span class="s">&quot;spark.scheduler.pool&quot;</span><span class="o">,</span> <span class="kc">null</span><span class="o">)</span></code></pre></div>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">sc</span><span class="o">.</span><span class="n">setLocalProperty</span><span class="o">(</span><span class="s">&quot;spark.scheduler.pool&quot;</span><span class="o">,</span> <span class="kc">null</span><span class="o">)</span></code></pre></figure>
 
 <h2 id="default-behavior-of-pools">Default Behavior of Pools</h2>
 
@@ -379,12 +379,12 @@ of the cluster. By default, each pool&#8217;s <code>minShare</code> is 0.</li>
 and setting a <code>spark.scheduler.allocation.file</code> property in your
 <a href="configuration.html#spark-properties">SparkConf</a>.</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="o">(</span><span class="s">&quot;spark.scheduler.allocation.file&quot;</span><span class="o">,</span> <span class="s">&quot;/path/to/file&quot;</span><span class="o">)</span></code></pre></div>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="o">(</span><span class="s">&quot;spark.scheduler.allocation.file&quot;</span><span class="o">,</span> <span class="s">&quot;/path/to/file&quot;</span><span class="o">)</span></code></pre></figure>
 
 <p>The format of the XML file is simply a <code>&lt;pool&gt;</code> element for each pool, with different elements
 within it for the various settings. For example:</p>
 
-<div class="highlight"><pre><code class="language-xml" data-lang="xml"><span class="cp">&lt;?xml version=&quot;1.0&quot;?&gt;</span>
+<figure class="highlight"><pre><code class="language-xml" data-lang="xml"><span></span><span class="cp">&lt;?xml version=&quot;1.0&quot;?&gt;</span>
 <span class="nt">&lt;allocations&gt;</span>
   <span class="nt">&lt;pool</span> <span class="na">name=</span><span class="s">&quot;production&quot;</span><span class="nt">&gt;</span>
     <span class="nt">&lt;schedulingMode&gt;</span>FAIR<span class="nt">&lt;/schedulingMode&gt;</span>
@@ -396,7 +396,7 @@ within it for the various settings. For example:</p>
     <span class="nt">&lt;weight&gt;</span>2<span class="nt">&lt;/weight&gt;</span>
     <span class="nt">&lt;minShare&gt;</span>3<span class="nt">&lt;/minShare&gt;</span>
   <span class="nt">&lt;/pool&gt;</span>
-<span class="nt">&lt;/allocations&gt;</span></code></pre></div>
+<span class="nt">&lt;/allocations&gt;</span></code></pre></figure>
 
 <p>A full example is also available in <code>conf/fairscheduler.xml.template</code>. Note that any pools not
 configured in the XML file will simply get default values for all settings (scheduling mode FIFO,

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/ml-advanced.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/ml-advanced.html b/site/docs/2.1.0/ml-advanced.html
index 02c95e1..84dcf43 100644
--- a/site/docs/2.1.0/ml-advanced.html
+++ b/site/docs/2.1.0/ml-advanced.html
@@ -307,10 +307,10 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#optimization-of-linear-methods-developer" id="markdown-toc-optimization-of-linear-methods-developer">Optimization of linear methods (developer)</a>    <ul>
-      <li><a href="#limited-memory-bfgs-l-bfgs" id="markdown-toc-limited-memory-bfgs-l-bfgs">Limited-memory BFGS (L-BFGS)</a></li>
-      <li><a href="#normal-equation-solver-for-weighted-least-squares" id="markdown-toc-normal-equation-solver-for-weighted-least-squares">Normal equation solver for weighted least squares</a></li>
-      <li><a href="#iteratively-reweighted-least-squares-irls" id="markdown-toc-iteratively-reweighted-least-squares-irls">Iteratively reweighted least squares (IRLS)</a></li>
+  <li><a href="#optimization-of-linear-methods-developer">Optimization of linear methods (developer)</a>    <ul>
+      <li><a href="#limited-memory-bfgs-l-bfgs">Limited-memory BFGS (L-BFGS)</a></li>
+      <li><a href="#normal-equation-solver-for-weighted-least-squares">Normal equation solver for weighted least squares</a></li>
+      <li><a href="#iteratively-reweighted-least-squares-irls">Iteratively reweighted least squares (IRLS)</a></li>
     </ul>
   </li>
 </ul>
@@ -385,7 +385,7 @@ Quasi-Newton methods in this case. This fallback is currently always enabled for
 
 <p><code>WeightedLeastSquares</code> supports L1, L2, and elastic-net regularization and provides options to enable or disable regularization and standardization. In the case where no 
 L1 regularization is applied (i.e. $\alpha = 0$), there exists an analytical solution and either Cholesky or Quasi-Newton solver may be used. When $\alpha &gt; 0$ no analytical 
-solution exists and we instead use the Quasi-Newton solver to find the coefficients iteratively.</p>
+solution exists and we instead use the Quasi-Newton solver to find the coefficients iteratively. </p>
 
 <p>In order to make the normal equation approach efficient, <code>WeightedLeastSquares</code> requires that the number of features be no more than 4096. For larger problems, use L-BFGS instead.</p>
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[05/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/storage-openstack-swift.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/storage-openstack-swift.html b/site/docs/2.1.0/storage-openstack-swift.html
index bbb3446..a20c67f 100644
--- a/site/docs/2.1.0/storage-openstack-swift.html
+++ b/site/docs/2.1.0/storage-openstack-swift.html
@@ -144,7 +144,7 @@ Current Swift driver requires Swift to use Keystone authentication method.</p>
 <p>The Spark application should include <code>hadoop-openstack</code> dependency.
 For example, for Maven support, add the following to the <code>pom.xml</code> file:</p>
 
-<div class="highlight"><pre><code class="language-xml" data-lang="xml"><span class="nt">&lt;dependencyManagement&gt;</span>
+<figure class="highlight"><pre><code class="language-xml" data-lang="xml"><span></span><span class="nt">&lt;dependencyManagement&gt;</span>
   ...
   <span class="nt">&lt;dependency&gt;</span>
     <span class="nt">&lt;groupId&gt;</span>org.apache.hadoop<span class="nt">&lt;/groupId&gt;</span>
@@ -152,15 +152,15 @@ For example, for Maven support, add the following to the <code>pom.xml</code> fi
     <span class="nt">&lt;version&gt;</span>2.3.0<span class="nt">&lt;/version&gt;</span>
   <span class="nt">&lt;/dependency&gt;</span>
   ...
-<span class="nt">&lt;/dependencyManagement&gt;</span></code></pre></div>
+<span class="nt">&lt;/dependencyManagement&gt;</span></code></pre></figure>
 
 <h1 id="configuration-parameters">Configuration Parameters</h1>
 
 <p>Create <code>core-site.xml</code> and place it inside Spark&#8217;s <code>conf</code> directory.
 There are two main categories of parameters that should to be configured: declaration of the
-Swift driver and the parameters that are required by Keystone.</p>
+Swift driver and the parameters that are required by Keystone. </p>
 
-<p>Configuration of Hadoop to use Swift File system achieved via</p>
+<p>Configuration of Hadoop to use Swift File system achieved via </p>
 
 <table class="table">
 <tr><th>Property Name</th><th>Value</th></tr>
@@ -221,7 +221,7 @@ contains a list of Keystone mandatory parameters. <code>PROVIDER</code> can be a
 <p>For example, assume <code>PROVIDER=SparkTest</code> and Keystone contains user <code>tester</code> with password <code>testing</code>
 defined for tenant <code>test</code>. Then <code>core-site.xml</code> should include:</p>
 
-<div class="highlight"><pre><code class="language-xml" data-lang="xml"><span class="nt">&lt;configuration&gt;</span>
+<figure class="highlight"><pre><code class="language-xml" data-lang="xml"><span></span><span class="nt">&lt;configuration&gt;</span>
   <span class="nt">&lt;property&gt;</span>
     <span class="nt">&lt;name&gt;</span>fs.swift.impl<span class="nt">&lt;/name&gt;</span>
     <span class="nt">&lt;value&gt;</span>org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem<span class="nt">&lt;/value&gt;</span>
@@ -257,7 +257,7 @@ defined for tenant <code>test</code>. Then <code>core-site.xml</code> should inc
     <span class="nt">&lt;name&gt;</span>fs.swift.service.SparkTest.password<span class="nt">&lt;/name&gt;</span>
     <span class="nt">&lt;value&gt;</span>testing<span class="nt">&lt;/value&gt;</span>
   <span class="nt">&lt;/property&gt;</span>
-<span class="nt">&lt;/configuration&gt;</span></code></pre></div>
+<span class="nt">&lt;/configuration&gt;</span></code></pre></figure>
 
 <p>Notice that
 <code>fs.swift.service.PROVIDER.tenant</code>,

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/streaming-custom-receivers.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/streaming-custom-receivers.html b/site/docs/2.1.0/streaming-custom-receivers.html
index d31647d..846c797 100644
--- a/site/docs/2.1.0/streaming-custom-receivers.html
+++ b/site/docs/2.1.0/streaming-custom-receivers.html
@@ -171,7 +171,7 @@ has any error connecting or receiving, the receiver is restarted to make another
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">CustomReceiver</span><span class="o">(</span><span class="n">host</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">port</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">class</span> <span class="nc">CustomReceiver</span><span class="o">(</span><span class="n">host</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">port</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span>
   <span class="k">extends</span> <span class="nc">Receiver</span><span class="o">[</span><span class="kt">String</span><span class="o">](</span><span class="nc">StorageLevel</span><span class="o">.</span><span class="nc">MEMORY_AND_DISK_2</span><span class="o">)</span> <span class="k">with</span> <span class="nc">Logging</span> <span class="o">{</span>
 
   <span class="k">def</span> <span class="n">onStart</span><span class="o">()</span> <span class="o">{</span>
@@ -216,12 +216,12 @@ has any error connecting or receiving, the receiver is restarted to make another
         <span class="n">restart</span><span class="o">(</span><span class="s">&quot;Error receiving data&quot;</span><span class="o">,</span> <span class="n">t</span><span class="o">)</span>
     <span class="o">}</span>
   <span class="o">}</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">JavaCustomReceiver</span> <span class="kd">extends</span> <span class="n">Receiver</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="o">{</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kd">public</span> <span class="kd">class</span> <span class="nc">JavaCustomReceiver</span> <span class="kd">extends</span> <span class="n">Receiver</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="o">{</span>
 
   <span class="n">String</span> <span class="n">host</span> <span class="o">=</span> <span class="kc">null</span><span class="o">;</span>
   <span class="kt">int</span> <span class="n">port</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="o">;</span>
@@ -234,7 +234,7 @@ has any error connecting or receiving, the receiver is restarted to make another
 
   <span class="kd">public</span> <span class="kt">void</span> <span class="nf">onStart</span><span class="o">()</span> <span class="o">{</span>
     <span class="c1">// Start the thread that receives data over a connection</span>
-    <span class="k">new</span> <span class="nf">Thread</span><span class="o">()</span>  <span class="o">{</span>
+    <span class="k">new</span> <span class="n">Thread</span><span class="o">()</span>  <span class="o">{</span>
       <span class="nd">@Override</span> <span class="kd">public</span> <span class="kt">void</span> <span class="nf">run</span><span class="o">()</span> <span class="o">{</span>
         <span class="n">receive</span><span class="o">();</span>
       <span class="o">}</span>
@@ -253,10 +253,10 @@ has any error connecting or receiving, the receiver is restarted to make another
 
     <span class="k">try</span> <span class="o">{</span>
       <span class="c1">// connect to the server</span>
-      <span class="n">socket</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Socket</span><span class="o">(</span><span class="n">host</span><span class="o">,</span> <span class="n">port</span><span class="o">);</span>
+      <span class="n">socket</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Socket</span><span class="o">(</span><span class="n">host</span><span class="o">,</span> <span class="n">port</span><span class="o">);</span>
 
-      <span class="n">BufferedReader</span> <span class="n">reader</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">BufferedReader</span><span class="o">(</span>
-        <span class="k">new</span> <span class="nf">InputStreamReader</span><span class="o">(</span><span class="n">socket</span><span class="o">.</span><span class="na">getInputStream</span><span class="o">(),</span> <span class="n">StandardCharsets</span><span class="o">.</span><span class="na">UTF_8</span><span class="o">));</span>
+      <span class="n">BufferedReader</span> <span class="n">reader</span> <span class="o">=</span> <span class="k">new</span> <span class="n">BufferedReader</span><span class="o">(</span>
+        <span class="k">new</span> <span class="n">InputStreamReader</span><span class="o">(</span><span class="n">socket</span><span class="o">.</span><span class="na">getInputStream</span><span class="o">(),</span> <span class="n">StandardCharsets</span><span class="o">.</span><span class="na">UTF_8</span><span class="o">));</span>
 
       <span class="c1">// Until stopped or connection broken continue reading</span>
       <span class="k">while</span> <span class="o">(!</span><span class="n">isStopped</span><span class="o">()</span> <span class="o">&amp;&amp;</span> <span class="o">(</span><span class="n">userInput</span> <span class="o">=</span> <span class="n">reader</span><span class="o">.</span><span class="na">readLine</span><span class="o">())</span> <span class="o">!=</span> <span class="kc">null</span><span class="o">)</span> <span class="o">{</span>
@@ -276,7 +276,7 @@ has any error connecting or receiving, the receiver is restarted to make another
       <span class="n">restart</span><span class="o">(</span><span class="s">&quot;Error receiving data&quot;</span><span class="o">,</span> <span class="n">t</span><span class="o">);</span>
     <span class="o">}</span>
   <span class="o">}</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
   </div>
 </div>
@@ -290,20 +290,20 @@ an input DStream using data received by the instance of custom receiver, as show
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Assuming ssc is the StreamingContext</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// Assuming ssc is the StreamingContext</span>
 <span class="k">val</span> <span class="n">customReceiverStream</span> <span class="k">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">receiverStream</span><span class="o">(</span><span class="k">new</span> <span class="nc">CustomReceiver</span><span class="o">(</span><span class="n">host</span><span class="o">,</span> <span class="n">port</span><span class="o">))</span>
 <span class="k">val</span> <span class="n">words</span> <span class="k">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">flatMap</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">))</span>
-<span class="o">...</span></code></pre></div>
+<span class="o">...</span></code></pre></figure>
 
     <p>The full source code is in the example <a href="https://github.com/apache/spark/blob/v2.1.0/examples/src/main/scala/org/apache/spark/examples/streaming/CustomReceiver.scala">CustomReceiver.scala</a>.</p>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Assuming ssc is the JavaStreamingContext</span>
-<span class="n">JavaDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">customReceiverStream</span> <span class="o">=</span> <span class="n">ssc</span><span class="o">.</span><span class="na">receiverStream</span><span class="o">(</span><span class="k">new</span> <span class="nf">JavaCustomReceiver</span><span class="o">(</span><span class="n">host</span><span class="o">,</span> <span class="n">port</span><span class="o">));</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// Assuming ssc is the JavaStreamingContext</span>
+<span class="n">JavaDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">customReceiverStream</span> <span class="o">=</span> <span class="n">ssc</span><span class="o">.</span><span class="na">receiverStream</span><span class="o">(</span><span class="k">new</span> <span class="n">JavaCustomReceiver</span><span class="o">(</span><span class="n">host</span><span class="o">,</span> <span class="n">port</span><span class="o">));</span>
 <span class="n">JavaDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">words</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="na">flatMap</span><span class="o">(</span><span class="k">new</span> <span class="n">FlatMapFunction</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;()</span> <span class="o">{</span> <span class="o">...</span> <span class="o">});</span>
-<span class="o">...</span></code></pre></div>
+<span class="o">...</span></code></pre></figure>
 
     <p>The full source code is in the example <a href="https://github.com/apache/spark/blob/v2.1.0/examples/src/main/java/org/apache/spark/examples/streaming/JavaCustomReceiver.java">JavaCustomReceiver.java</a>.</p>
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/streaming-kafka-0-10-integration.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/streaming-kafka-0-10-integration.html b/site/docs/2.1.0/streaming-kafka-0-10-integration.html
index a4f39fb..1e7fbba 100644
--- a/site/docs/2.1.0/streaming-kafka-0-10-integration.html
+++ b/site/docs/2.1.0/streaming-kafka-0-10-integration.html
@@ -142,7 +142,7 @@ version = 2.1.0
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.kafka.clients.consumer.ConsumerRecord</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.kafka.clients.consumer.ConsumerRecord</span>
 <span class="k">import</span> <span class="nn">org.apache.kafka.common.serialization.StringDeserializer</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.streaming.kafka010._</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent</span>
@@ -164,13 +164,13 @@ version = 2.1.0
   <span class="nc">Subscribe</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">](</span><span class="n">topics</span><span class="o">,</span> <span class="n">kafkaParams</span><span class="o">)</span>
 <span class="o">)</span>
 
-<span class="n">stream</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">record</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">record</span><span class="o">.</span><span class="n">key</span><span class="o">,</span> <span class="n">record</span><span class="o">.</span><span class="n">value</span><span class="o">))</span></code></pre></div>
+<span class="n">stream</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">record</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">record</span><span class="o">.</span><span class="n">key</span><span class="o">,</span> <span class="n">record</span><span class="o">.</span><span class="n">value</span><span class="o">))</span></code></pre></figure>
 
     <p>Each item in the stream is a <a href="http://kafka.apache.org/0100/javadoc/org/apache/kafka/clients/consumer/ConsumerRecord.html">ConsumerRecord</a></p>
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">java.util.*</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">java.util.*</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.SparkConf</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.TaskContext</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.*</span><span class="o">;</span>
@@ -205,7 +205,7 @@ version = 2.1.0
     <span class="kd">public</span> <span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="nf">call</span><span class="o">(</span><span class="n">ConsumerRecord</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">record</span><span class="o">)</span> <span class="o">{</span>
       <span class="k">return</span> <span class="k">new</span> <span class="n">Tuple2</span><span class="o">&lt;&gt;(</span><span class="n">record</span><span class="o">.</span><span class="na">key</span><span class="o">(),</span> <span class="n">record</span><span class="o">.</span><span class="na">value</span><span class="o">());</span>
     <span class="o">}</span>
-  <span class="o">})</span></code></pre></div>
+  <span class="o">})</span></code></pre></figure>
 
   </div>
 </div>
@@ -236,7 +236,7 @@ Note that the example sets enable.auto.commit to false, for discussion see <a hr
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Import dependencies and create kafka params as in Create Direct Stream above</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// Import dependencies and create kafka params as in Create Direct Stream above</span>
 
 <span class="k">val</span> <span class="n">offsetRanges</span> <span class="k">=</span> <span class="nc">Array</span><span class="o">(</span>
   <span class="c1">// topic, partition, inclusive starting offset, exclusive ending offset</span>
@@ -244,12 +244,12 @@ Note that the example sets enable.auto.commit to false, for discussion see <a hr
   <span class="nc">OffsetRange</span><span class="o">(</span><span class="s">&quot;test&quot;</span><span class="o">,</span> <span class="mi">1</span><span class="o">,</span> <span class="mi">0</span><span class="o">,</span> <span class="mi">100</span><span class="o">)</span>
 <span class="o">)</span>
 
-<span class="k">val</span> <span class="n">rdd</span> <span class="k">=</span> <span class="nc">KafkaUtils</span><span class="o">.</span><span class="n">createRDD</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">](</span><span class="n">sparkContext</span><span class="o">,</span> <span class="n">kafkaParams</span><span class="o">,</span> <span class="n">offsetRanges</span><span class="o">,</span> <span class="nc">PreferConsistent</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">rdd</span> <span class="k">=</span> <span class="nc">KafkaUtils</span><span class="o">.</span><span class="n">createRDD</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">](</span><span class="n">sparkContext</span><span class="o">,</span> <span class="n">kafkaParams</span><span class="o">,</span> <span class="n">offsetRanges</span><span class="o">,</span> <span class="nc">PreferConsistent</span><span class="o">)</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Import dependencies and create kafka params as in Create Direct Stream above</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// Import dependencies and create kafka params as in Create Direct Stream above</span>
 
 <span class="n">OffsetRange</span><span class="o">[]</span> <span class="n">offsetRanges</span> <span class="o">=</span> <span class="o">{</span>
   <span class="c1">// topic, partition, inclusive starting offset, exclusive ending offset</span>
@@ -262,7 +262,7 @@ Note that the example sets enable.auto.commit to false, for discussion see <a hr
   <span class="n">kafkaParams</span><span class="o">,</span>
   <span class="n">offsetRanges</span><span class="o">,</span>
   <span class="n">LocationStrategies</span><span class="o">.</span><span class="na">PreferConsistent</span><span class="o">()</span>
-<span class="o">);</span></code></pre></div>
+<span class="o">);</span></code></pre></figure>
 
   </div>
 </div>
@@ -274,18 +274,18 @@ Note that the example sets enable.auto.commit to false, for discussion see <a hr
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">stream</span><span class="o">.</span><span class="n">foreachRDD</span> <span class="o">{</span> <span class="n">rdd</span> <span class="k">=&gt;</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">stream</span><span class="o">.</span><span class="n">foreachRDD</span> <span class="o">{</span> <span class="n">rdd</span> <span class="k">=&gt;</span>
   <span class="k">val</span> <span class="n">offsetRanges</span> <span class="k">=</span> <span class="n">rdd</span><span class="o">.</span><span class="n">asInstanceOf</span><span class="o">[</span><span class="kt">HasOffsetRanges</span><span class="o">].</span><span class="n">offsetRanges</span>
   <span class="n">rdd</span><span class="o">.</span><span class="n">foreachPartition</span> <span class="o">{</span> <span class="n">iter</span> <span class="k">=&gt;</span>
     <span class="k">val</span> <span class="n">o</span><span class="k">:</span> <span class="kt">OffsetRange</span> <span class="o">=</span> <span class="n">offsetRanges</span><span class="o">(</span><span class="nc">TaskContext</span><span class="o">.</span><span class="n">get</span><span class="o">.</span><span class="n">partitionId</span><span class="o">)</span>
-    <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;${o.topic} ${o.partition} ${o.fromOffset} ${o.untilOffset}&quot;</span><span class="o">)</span>
+    <span class="n">println</span><span class="o">(</span><span class="s">s&quot;</span><span class="si">${</span><span class="n">o</span><span class="o">.</span><span class="n">topic</span><span class="si">}</span><span class="s"> </span><span class="si">${</span><span class="n">o</span><span class="o">.</span><span class="n">partition</span><span class="si">}</span><span class="s"> </span><span class="si">${</span><span class="n">o</span><span class="o">.</span><span class="n">fromOffset</span><span class="si">}</span><span class="s"> </span><span class="si">${</span><span class="n">o</span><span class="o">.</span><span class="n">untilOffset</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
   <span class="o">}</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">stream</span><span class="o">.</span><span class="na">foreachRDD</span><span class="o">(</span><span class="k">new</span> <span class="n">VoidFunction</span><span class="o">&lt;</span><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">ConsumerRecord</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;&gt;&gt;()</span> <span class="o">{</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">stream</span><span class="o">.</span><span class="na">foreachRDD</span><span class="o">(</span><span class="k">new</span> <span class="n">VoidFunction</span><span class="o">&lt;</span><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">ConsumerRecord</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;&gt;&gt;()</span> <span class="o">{</span>
   <span class="nd">@Override</span>
   <span class="kd">public</span> <span class="kt">void</span> <span class="nf">call</span><span class="o">(</span><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">ConsumerRecord</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;&gt;</span> <span class="n">rdd</span><span class="o">)</span> <span class="o">{</span>
     <span class="kd">final</span> <span class="n">OffsetRange</span><span class="o">[]</span> <span class="n">offsetRanges</span> <span class="o">=</span> <span class="o">((</span><span class="n">HasOffsetRanges</span><span class="o">)</span> <span class="n">rdd</span><span class="o">.</span><span class="na">rdd</span><span class="o">()).</span><span class="na">offsetRanges</span><span class="o">();</span>
@@ -298,7 +298,7 @@ Note that the example sets enable.auto.commit to false, for discussion see <a hr
       <span class="o">}</span>
     <span class="o">});</span>
   <span class="o">}</span>
-<span class="o">});</span></code></pre></div>
+<span class="o">});</span></code></pre></figure>
 
   </div>
 </div>
@@ -317,18 +317,18 @@ Note that the example sets enable.auto.commit to false, for discussion see <a hr
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">stream</span><span class="o">.</span><span class="n">foreachRDD</span> <span class="o">{</span> <span class="n">rdd</span> <span class="k">=&gt;</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">stream</span><span class="o">.</span><span class="n">foreachRDD</span> <span class="o">{</span> <span class="n">rdd</span> <span class="k">=&gt;</span>
   <span class="k">val</span> <span class="n">offsetRanges</span> <span class="k">=</span> <span class="n">rdd</span><span class="o">.</span><span class="n">asInstanceOf</span><span class="o">[</span><span class="kt">HasOffsetRanges</span><span class="o">].</span><span class="n">offsetRanges</span>
 
   <span class="c1">// some time later, after outputs have completed</span>
   <span class="n">stream</span><span class="o">.</span><span class="n">asInstanceOf</span><span class="o">[</span><span class="kt">CanCommitOffsets</span><span class="o">].</span><span class="n">commitAsync</span><span class="o">(</span><span class="n">offsetRanges</span><span class="o">)</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
     <p>As with HasOffsetRanges, the cast to CanCommitOffsets will only succeed if called on the result of createDirectStream, not after transformations.  The commitAsync call is threadsafe, but must occur after outputs if you want meaningful semantics.</p>
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">stream</span><span class="o">.</span><span class="na">foreachRDD</span><span class="o">(</span><span class="k">new</span> <span class="n">VoidFunction</span><span class="o">&lt;</span><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">ConsumerRecord</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;&gt;&gt;()</span> <span class="o">{</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">stream</span><span class="o">.</span><span class="na">foreachRDD</span><span class="o">(</span><span class="k">new</span> <span class="n">VoidFunction</span><span class="o">&lt;</span><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">ConsumerRecord</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;&gt;&gt;()</span> <span class="o">{</span>
   <span class="nd">@Override</span>
   <span class="kd">public</span> <span class="kt">void</span> <span class="nf">call</span><span class="o">(</span><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">ConsumerRecord</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;&gt;</span> <span class="n">rdd</span><span class="o">)</span> <span class="o">{</span>
     <span class="n">OffsetRange</span><span class="o">[]</span> <span class="n">offsetRanges</span> <span class="o">=</span> <span class="o">((</span><span class="n">HasOffsetRanges</span><span class="o">)</span> <span class="n">rdd</span><span class="o">.</span><span class="na">rdd</span><span class="o">()).</span><span class="na">offsetRanges</span><span class="o">();</span>
@@ -336,7 +336,7 @@ Note that the example sets enable.auto.commit to false, for discussion see <a hr
     <span class="c1">// some time later, after outputs have completed</span>
     <span class="o">((</span><span class="n">CanCommitOffsets</span><span class="o">)</span> <span class="n">stream</span><span class="o">.</span><span class="na">inputDStream</span><span class="o">()).</span><span class="na">commitAsync</span><span class="o">(</span><span class="n">offsetRanges</span><span class="o">);</span>
   <span class="o">}</span>
-<span class="o">});</span></code></pre></div>
+<span class="o">});</span></code></pre></figure>
 
   </div>
 </div>
@@ -347,7 +347,7 @@ Note that the example sets enable.auto.commit to false, for discussion see <a hr
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// The details depend on your data store, but the general idea looks like this</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// The details depend on your data store, but the general idea looks like this</span>
 
 <span class="c1">// begin from the the offsets committed to the database</span>
 <span class="k">val</span> <span class="n">fromOffsets</span> <span class="k">=</span> <span class="n">selectOffsetsFromYourDatabase</span><span class="o">.</span><span class="n">map</span> <span class="o">{</span> <span class="n">resultSet</span> <span class="k">=&gt;</span>
@@ -372,17 +372,17 @@ Note that the example sets enable.auto.commit to false, for discussion see <a hr
   <span class="c1">// assert that offsets were updated correctly</span>
 
   <span class="c1">// end your transaction</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// The details depend on your data store, but the general idea looks like this</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// The details depend on your data store, but the general idea looks like this</span>
 
 <span class="c1">// begin from the the offsets committed to the database</span>
 <span class="n">Map</span><span class="o">&lt;</span><span class="n">TopicPartition</span><span class="o">,</span> <span class="n">Long</span><span class="o">&gt;</span> <span class="n">fromOffsets</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">&lt;&gt;();</span>
 <span class="k">for</span> <span class="o">(</span><span class="n">resultSet</span> <span class="o">:</span> <span class="n">selectOffsetsFromYourDatabase</span><span class="o">)</span>
-  <span class="n">fromOffsets</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="k">new</span> <span class="nf">TopicPartition</span><span class="o">(</span><span class="n">resultSet</span><span class="o">.</span><span class="na">string</span><span class="o">(</span><span class="s">&quot;topic&quot;</span><span class="o">),</span> <span class="n">resultSet</span><span class="o">.</span><span class="na">int</span><span class="o">(</span><span class="s">&quot;partition&quot;</span><span class="o">)),</span> <span class="n">resultSet</span><span class="o">.</span><span class="na">long</span><span class="o">(</span><span class="s">&quot;offset&quot;</span><span class="o">));</span>
+  <span class="n">fromOffsets</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="k">new</span> <span class="n">TopicPartition</span><span class="o">(</span><span class="n">resultSet</span><span class="o">.</span><span class="na">string</span><span class="o">(</span><span class="s">&quot;topic&quot;</span><span class="o">),</span> <span class="n">resultSet</span><span class="o">.</span><span class="na">int</span><span class="o">(</span><span class="s">&quot;partition&quot;</span><span class="o">)),</span> <span class="n">resultSet</span><span class="o">.</span><span class="na">long</span><span class="o">(</span><span class="s">&quot;offset&quot;</span><span class="o">));</span>
 <span class="o">}</span>
 
 <span class="n">JavaInputDStream</span><span class="o">&lt;</span><span class="n">ConsumerRecord</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;&gt;</span> <span class="n">stream</span> <span class="o">=</span> <span class="n">KafkaUtils</span><span class="o">.</span><span class="na">createDirectStream</span><span class="o">(</span>
@@ -406,7 +406,7 @@ Note that the example sets enable.auto.commit to false, for discussion see <a hr
 
     <span class="c1">// end your transaction</span>
   <span class="o">}</span>
-<span class="o">});</span></code></pre></div>
+<span class="o">});</span></code></pre></figure>
 
   </div>
 </div>
@@ -417,7 +417,7 @@ Note that the example sets enable.auto.commit to false, for discussion see <a hr
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">kafkaParams</span> <span class="k">=</span> <span class="nc">Map</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">Object</span><span class="o">](</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">kafkaParams</span> <span class="k">=</span> <span class="nc">Map</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">Object</span><span class="o">](</span>
   <span class="c1">// the usual params, make sure to change the port in bootstrap.servers if 9092 is not TLS</span>
   <span class="s">&quot;security.protocol&quot;</span> <span class="o">-&gt;</span> <span class="s">&quot;SSL&quot;</span><span class="o">,</span>
   <span class="s">&quot;ssl.truststore.location&quot;</span> <span class="o">-&gt;</span> <span class="s">&quot;/some-directory/kafka.client.truststore.jks&quot;</span><span class="o">,</span>
@@ -425,19 +425,19 @@ Note that the example sets enable.auto.commit to false, for discussion see <a hr
   <span class="s">&quot;ssl.keystore.location&quot;</span> <span class="o">-&gt;</span> <span class="s">&quot;/some-directory/kafka.client.keystore.jks&quot;</span><span class="o">,</span>
   <span class="s">&quot;ssl.keystore.password&quot;</span> <span class="o">-&gt;</span> <span class="s">&quot;test1234&quot;</span><span class="o">,</span>
   <span class="s">&quot;ssl.key.password&quot;</span> <span class="o">-&gt;</span> <span class="s">&quot;test1234&quot;</span>
-<span class="o">)</span></code></pre></div>
+<span class="o">)</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;</span> <span class="n">kafkaParams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;();</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;</span> <span class="n">kafkaParams</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashMap</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;();</span>
 <span class="c1">// the usual params, make sure to change the port in bootstrap.servers if 9092 is not TLS</span>
 <span class="n">kafkaParams</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;security.protocol&quot;</span><span class="o">,</span> <span class="s">&quot;SSL&quot;</span><span class="o">);</span>
 <span class="n">kafkaParams</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;ssl.truststore.location&quot;</span><span class="o">,</span> <span class="s">&quot;/some-directory/kafka.client.truststore.jks&quot;</span><span class="o">);</span>
 <span class="n">kafkaParams</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;ssl.truststore.password&quot;</span><span class="o">,</span> <span class="s">&quot;test1234&quot;</span><span class="o">);</span>
 <span class="n">kafkaParams</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;ssl.keystore.location&quot;</span><span class="o">,</span> <span class="s">&quot;/some-directory/kafka.client.keystore.jks&quot;</span><span class="o">);</span>
 <span class="n">kafkaParams</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;ssl.keystore.password&quot;</span><span class="o">,</span> <span class="s">&quot;test1234&quot;</span><span class="o">);</span>
-<span class="n">kafkaParams</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;ssl.key.password&quot;</span><span class="o">,</span> <span class="s">&quot;test1234&quot;</span><span class="o">);</span></code></pre></div>
+<span class="n">kafkaParams</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">&quot;ssl.key.password&quot;</span><span class="o">,</span> <span class="s">&quot;test1234&quot;</span><span class="o">);</span></code></pre></figure>
 
   </div>
 </div>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[06/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/sql-programming-guide.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/sql-programming-guide.html b/site/docs/2.1.0/sql-programming-guide.html
index 17f5981..4534a98 100644
--- a/site/docs/2.1.0/sql-programming-guide.html
+++ b/site/docs/2.1.0/sql-programming-guide.html
@@ -127,95 +127,95 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#overview" id="markdown-toc-overview">Overview</a>    <ul>
-      <li><a href="#sql" id="markdown-toc-sql">SQL</a></li>
-      <li><a href="#datasets-and-dataframes" id="markdown-toc-datasets-and-dataframes">Datasets and DataFrames</a></li>
+  <li><a href="#overview">Overview</a>    <ul>
+      <li><a href="#sql">SQL</a></li>
+      <li><a href="#datasets-and-dataframes">Datasets and DataFrames</a></li>
     </ul>
   </li>
-  <li><a href="#getting-started" id="markdown-toc-getting-started">Getting Started</a>    <ul>
-      <li><a href="#starting-point-sparksession" id="markdown-toc-starting-point-sparksession">Starting Point: SparkSession</a></li>
-      <li><a href="#creating-dataframes" id="markdown-toc-creating-dataframes">Creating DataFrames</a></li>
-      <li><a href="#untyped-dataset-operations-aka-dataframe-operations" id="markdown-toc-untyped-dataset-operations-aka-dataframe-operations">Untyped Dataset Operations (aka DataFrame Operations)</a></li>
-      <li><a href="#running-sql-queries-programmatically" id="markdown-toc-running-sql-queries-programmatically">Running SQL Queries Programmatically</a></li>
-      <li><a href="#global-temporary-view" id="markdown-toc-global-temporary-view">Global Temporary View</a></li>
-      <li><a href="#creating-datasets" id="markdown-toc-creating-datasets">Creating Datasets</a></li>
-      <li><a href="#interoperating-with-rdds" id="markdown-toc-interoperating-with-rdds">Interoperating with RDDs</a>        <ul>
-          <li><a href="#inferring-the-schema-using-reflection" id="markdown-toc-inferring-the-schema-using-reflection">Inferring the Schema Using Reflection</a></li>
-          <li><a href="#programmatically-specifying-the-schema" id="markdown-toc-programmatically-specifying-the-schema">Programmatically Specifying the Schema</a></li>
+  <li><a href="#getting-started">Getting Started</a>    <ul>
+      <li><a href="#starting-point-sparksession">Starting Point: SparkSession</a></li>
+      <li><a href="#creating-dataframes">Creating DataFrames</a></li>
+      <li><a href="#untyped-dataset-operations-aka-dataframe-operations">Untyped Dataset Operations (aka DataFrame Operations)</a></li>
+      <li><a href="#running-sql-queries-programmatically">Running SQL Queries Programmatically</a></li>
+      <li><a href="#global-temporary-view">Global Temporary View</a></li>
+      <li><a href="#creating-datasets">Creating Datasets</a></li>
+      <li><a href="#interoperating-with-rdds">Interoperating with RDDs</a>        <ul>
+          <li><a href="#inferring-the-schema-using-reflection">Inferring the Schema Using Reflection</a></li>
+          <li><a href="#programmatically-specifying-the-schema">Programmatically Specifying the Schema</a></li>
         </ul>
       </li>
     </ul>
   </li>
-  <li><a href="#data-sources" id="markdown-toc-data-sources">Data Sources</a>    <ul>
-      <li><a href="#generic-loadsave-functions" id="markdown-toc-generic-loadsave-functions">Generic Load/Save Functions</a>        <ul>
-          <li><a href="#manually-specifying-options" id="markdown-toc-manually-specifying-options">Manually Specifying Options</a></li>
-          <li><a href="#run-sql-on-files-directly" id="markdown-toc-run-sql-on-files-directly">Run SQL on files directly</a></li>
-          <li><a href="#save-modes" id="markdown-toc-save-modes">Save Modes</a></li>
-          <li><a href="#saving-to-persistent-tables" id="markdown-toc-saving-to-persistent-tables">Saving to Persistent Tables</a></li>
+  <li><a href="#data-sources">Data Sources</a>    <ul>
+      <li><a href="#generic-loadsave-functions">Generic Load/Save Functions</a>        <ul>
+          <li><a href="#manually-specifying-options">Manually Specifying Options</a></li>
+          <li><a href="#run-sql-on-files-directly">Run SQL on files directly</a></li>
+          <li><a href="#save-modes">Save Modes</a></li>
+          <li><a href="#saving-to-persistent-tables">Saving to Persistent Tables</a></li>
         </ul>
       </li>
-      <li><a href="#parquet-files" id="markdown-toc-parquet-files">Parquet Files</a>        <ul>
-          <li><a href="#loading-data-programmatically" id="markdown-toc-loading-data-programmatically">Loading Data Programmatically</a></li>
-          <li><a href="#partition-discovery" id="markdown-toc-partition-discovery">Partition Discovery</a></li>
-          <li><a href="#schema-merging" id="markdown-toc-schema-merging">Schema Merging</a></li>
-          <li><a href="#hive-metastore-parquet-table-conversion" id="markdown-toc-hive-metastore-parquet-table-conversion">Hive metastore Parquet table conversion</a>            <ul>
-              <li><a href="#hiveparquet-schema-reconciliation" id="markdown-toc-hiveparquet-schema-reconciliation">Hive/Parquet Schema Reconciliation</a></li>
-              <li><a href="#metadata-refreshing" id="markdown-toc-metadata-refreshing">Metadata Refreshing</a></li>
+      <li><a href="#parquet-files">Parquet Files</a>        <ul>
+          <li><a href="#loading-data-programmatically">Loading Data Programmatically</a></li>
+          <li><a href="#partition-discovery">Partition Discovery</a></li>
+          <li><a href="#schema-merging">Schema Merging</a></li>
+          <li><a href="#hive-metastore-parquet-table-conversion">Hive metastore Parquet table conversion</a>            <ul>
+              <li><a href="#hiveparquet-schema-reconciliation">Hive/Parquet Schema Reconciliation</a></li>
+              <li><a href="#metadata-refreshing">Metadata Refreshing</a></li>
             </ul>
           </li>
-          <li><a href="#configuration" id="markdown-toc-configuration">Configuration</a></li>
+          <li><a href="#configuration">Configuration</a></li>
         </ul>
       </li>
-      <li><a href="#json-datasets" id="markdown-toc-json-datasets">JSON Datasets</a></li>
-      <li><a href="#hive-tables" id="markdown-toc-hive-tables">Hive Tables</a>        <ul>
-          <li><a href="#interacting-with-different-versions-of-hive-metastore" id="markdown-toc-interacting-with-different-versions-of-hive-metastore">Interacting with Different Versions of Hive Metastore</a></li>
+      <li><a href="#json-datasets">JSON Datasets</a></li>
+      <li><a href="#hive-tables">Hive Tables</a>        <ul>
+          <li><a href="#interacting-with-different-versions-of-hive-metastore">Interacting with Different Versions of Hive Metastore</a></li>
         </ul>
       </li>
-      <li><a href="#jdbc-to-other-databases" id="markdown-toc-jdbc-to-other-databases">JDBC To Other Databases</a></li>
-      <li><a href="#troubleshooting" id="markdown-toc-troubleshooting">Troubleshooting</a></li>
+      <li><a href="#jdbc-to-other-databases">JDBC To Other Databases</a></li>
+      <li><a href="#troubleshooting">Troubleshooting</a></li>
     </ul>
   </li>
-  <li><a href="#performance-tuning" id="markdown-toc-performance-tuning">Performance Tuning</a>    <ul>
-      <li><a href="#caching-data-in-memory" id="markdown-toc-caching-data-in-memory">Caching Data In Memory</a></li>
-      <li><a href="#other-configuration-options" id="markdown-toc-other-configuration-options">Other Configuration Options</a></li>
+  <li><a href="#performance-tuning">Performance Tuning</a>    <ul>
+      <li><a href="#caching-data-in-memory">Caching Data In Memory</a></li>
+      <li><a href="#other-configuration-options">Other Configuration Options</a></li>
     </ul>
   </li>
-  <li><a href="#distributed-sql-engine" id="markdown-toc-distributed-sql-engine">Distributed SQL Engine</a>    <ul>
-      <li><a href="#running-the-thrift-jdbcodbc-server" id="markdown-toc-running-the-thrift-jdbcodbc-server">Running the Thrift JDBC/ODBC server</a></li>
-      <li><a href="#running-the-spark-sql-cli" id="markdown-toc-running-the-spark-sql-cli">Running the Spark SQL CLI</a></li>
+  <li><a href="#distributed-sql-engine">Distributed SQL Engine</a>    <ul>
+      <li><a href="#running-the-thrift-jdbcodbc-server">Running the Thrift JDBC/ODBC server</a></li>
+      <li><a href="#running-the-spark-sql-cli">Running the Spark SQL CLI</a></li>
     </ul>
   </li>
-  <li><a href="#migration-guide" id="markdown-toc-migration-guide">Migration Guide</a>    <ul>
-      <li><a href="#upgrading-from-spark-sql-20-to-21" id="markdown-toc-upgrading-from-spark-sql-20-to-21">Upgrading From Spark SQL 2.0 to 2.1</a></li>
-      <li><a href="#upgrading-from-spark-sql-16-to-20" id="markdown-toc-upgrading-from-spark-sql-16-to-20">Upgrading From Spark SQL 1.6 to 2.0</a></li>
-      <li><a href="#upgrading-from-spark-sql-15-to-16" id="markdown-toc-upgrading-from-spark-sql-15-to-16">Upgrading From Spark SQL 1.5 to 1.6</a></li>
-      <li><a href="#upgrading-from-spark-sql-14-to-15" id="markdown-toc-upgrading-from-spark-sql-14-to-15">Upgrading From Spark SQL 1.4 to 1.5</a></li>
-      <li><a href="#upgrading-from-spark-sql-13-to-14" id="markdown-toc-upgrading-from-spark-sql-13-to-14">Upgrading from Spark SQL 1.3 to 1.4</a>        <ul>
-          <li><a href="#dataframe-data-readerwriter-interface" id="markdown-toc-dataframe-data-readerwriter-interface">DataFrame data reader/writer interface</a></li>
-          <li><a href="#dataframegroupby-retains-grouping-columns" id="markdown-toc-dataframegroupby-retains-grouping-columns">DataFrame.groupBy retains grouping columns</a></li>
-          <li><a href="#behavior-change-on-dataframewithcolumn" id="markdown-toc-behavior-change-on-dataframewithcolumn">Behavior change on DataFrame.withColumn</a></li>
+  <li><a href="#migration-guide">Migration Guide</a>    <ul>
+      <li><a href="#upgrading-from-spark-sql-20-to-21">Upgrading From Spark SQL 2.0 to 2.1</a></li>
+      <li><a href="#upgrading-from-spark-sql-16-to-20">Upgrading From Spark SQL 1.6 to 2.0</a></li>
+      <li><a href="#upgrading-from-spark-sql-15-to-16">Upgrading From Spark SQL 1.5 to 1.6</a></li>
+      <li><a href="#upgrading-from-spark-sql-14-to-15">Upgrading From Spark SQL 1.4 to 1.5</a></li>
+      <li><a href="#upgrading-from-spark-sql-13-to-14">Upgrading from Spark SQL 1.3 to 1.4</a>        <ul>
+          <li><a href="#dataframe-data-readerwriter-interface">DataFrame data reader/writer interface</a></li>
+          <li><a href="#dataframegroupby-retains-grouping-columns">DataFrame.groupBy retains grouping columns</a></li>
+          <li><a href="#behavior-change-on-dataframewithcolumn">Behavior change on DataFrame.withColumn</a></li>
         </ul>
       </li>
-      <li><a href="#upgrading-from-spark-sql-10-12-to-13" id="markdown-toc-upgrading-from-spark-sql-10-12-to-13">Upgrading from Spark SQL 1.0-1.2 to 1.3</a>        <ul>
-          <li><a href="#rename-of-schemardd-to-dataframe" id="markdown-toc-rename-of-schemardd-to-dataframe">Rename of SchemaRDD to DataFrame</a></li>
-          <li><a href="#unification-of-the-java-and-scala-apis" id="markdown-toc-unification-of-the-java-and-scala-apis">Unification of the Java and Scala APIs</a></li>
-          <li><a href="#isolation-of-implicit-conversions-and-removal-of-dsl-package-scala-only" id="markdown-toc-isolation-of-implicit-conversions-and-removal-of-dsl-package-scala-only">Isolation of Implicit Conversions and Removal of dsl Package (Scala-only)</a></li>
-          <li><a href="#removal-of-the-type-aliases-in-orgapachesparksql-for-datatype-scala-only" id="markdown-toc-removal-of-the-type-aliases-in-orgapachesparksql-for-datatype-scala-only">Removal of the type aliases in org.apache.spark.sql for DataType (Scala-only)</a></li>
-          <li><a href="#udf-registration-moved-to-sqlcontextudf-java--scala" id="markdown-toc-udf-registration-moved-to-sqlcontextudf-java--scala">UDF Registration Moved to <code>sqlContext.udf</code> (Java &amp; Scala)</a></li>
-          <li><a href="#python-datatypes-no-longer-singletons" id="markdown-toc-python-datatypes-no-longer-singletons">Python DataTypes No Longer Singletons</a></li>
+      <li><a href="#upgrading-from-spark-sql-10-12-to-13">Upgrading from Spark SQL 1.0-1.2 to 1.3</a>        <ul>
+          <li><a href="#rename-of-schemardd-to-dataframe">Rename of SchemaRDD to DataFrame</a></li>
+          <li><a href="#unification-of-the-java-and-scala-apis">Unification of the Java and Scala APIs</a></li>
+          <li><a href="#isolation-of-implicit-conversions-and-removal-of-dsl-package-scala-only">Isolation of Implicit Conversions and Removal of dsl Package (Scala-only)</a></li>
+          <li><a href="#removal-of-the-type-aliases-in-orgapachesparksql-for-datatype-scala-only">Removal of the type aliases in org.apache.spark.sql for DataType (Scala-only)</a></li>
+          <li><a href="#udf-registration-moved-to-sqlcontextudf-java--scala">UDF Registration Moved to <code>sqlContext.udf</code> (Java &amp; Scala)</a></li>
+          <li><a href="#python-datatypes-no-longer-singletons">Python DataTypes No Longer Singletons</a></li>
         </ul>
       </li>
-      <li><a href="#compatibility-with-apache-hive" id="markdown-toc-compatibility-with-apache-hive">Compatibility with Apache Hive</a>        <ul>
-          <li><a href="#deploying-in-existing-hive-warehouses" id="markdown-toc-deploying-in-existing-hive-warehouses">Deploying in Existing Hive Warehouses</a></li>
-          <li><a href="#supported-hive-features" id="markdown-toc-supported-hive-features">Supported Hive Features</a></li>
-          <li><a href="#unsupported-hive-functionality" id="markdown-toc-unsupported-hive-functionality">Unsupported Hive Functionality</a></li>
+      <li><a href="#compatibility-with-apache-hive">Compatibility with Apache Hive</a>        <ul>
+          <li><a href="#deploying-in-existing-hive-warehouses">Deploying in Existing Hive Warehouses</a></li>
+          <li><a href="#supported-hive-features">Supported Hive Features</a></li>
+          <li><a href="#unsupported-hive-functionality">Unsupported Hive Functionality</a></li>
         </ul>
       </li>
     </ul>
   </li>
-  <li><a href="#reference" id="markdown-toc-reference">Reference</a>    <ul>
-      <li><a href="#data-types" id="markdown-toc-data-types">Data Types</a></li>
-      <li><a href="#nan-semantics" id="markdown-toc-nan-semantics">NaN Semantics</a></li>
+  <li><a href="#reference">Reference</a>    <ul>
+      <li><a href="#data-types">Data Types</a></li>
+      <li><a href="#nan-semantics">NaN Semantics</a></li>
     </ul>
   </li>
 </ul>
@@ -275,7 +275,7 @@ While, in <a href="api/java/index.html?org/apache/spark/sql/Dataset.html">Java A
 
     <p>The entry point into all functionality in Spark is the <a href="api/scala/index.html#org.apache.spark.sql.SparkSession"><code>SparkSession</code></a> class. To create a basic <code>SparkSession</code>, just use <code>SparkSession.builder()</code>:</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.sql.SparkSession</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.sql.SparkSession</span>
 
 <span class="k">val</span> <span class="n">spark</span> <span class="k">=</span> <span class="nc">SparkSession</span>
   <span class="o">.</span><span class="n">builder</span><span class="o">()</span>
@@ -293,7 +293,7 @@ While, in <a href="api/java/index.html?org/apache/spark/sql/Dataset.html">Java A
 
     <p>The entry point into all functionality in Spark is the <a href="api/java/index.html#org.apache.spark.sql.SparkSession"><code>SparkSession</code></a> class. To create a basic <code>SparkSession</code>, just use <code>SparkSession.builder()</code>:</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.sql.SparkSession</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.sql.SparkSession</span><span class="o">;</span>
 
 <span class="n">SparkSession</span> <span class="n">spark</span> <span class="o">=</span> <span class="n">SparkSession</span>
   <span class="o">.</span><span class="na">builder</span><span class="o">()</span>
@@ -308,12 +308,12 @@ While, in <a href="api/java/index.html?org/apache/spark/sql/Dataset.html">Java A
 
     <p>The entry point into all functionality in Spark is the <a href="api/python/pyspark.sql.html#pyspark.sql.SparkSession"><code>SparkSession</code></a> class. To create a basic <code>SparkSession</code>, just use <code>SparkSession.builder</code>:</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
 
 <span class="n">spark</span> <span class="o">=</span> <span class="n">SparkSession</span> \
     <span class="o">.</span><span class="n">builder</span> \
-    <span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s">&quot;Python Spark SQL basic example&quot;</span><span class="p">)</span> \
-    <span class="o">.</span><span class="n">config</span><span class="p">(</span><span class="s">&quot;spark.some.config.option&quot;</span><span class="p">,</span> <span class="s">&quot;some-value&quot;</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s2">&quot;Python Spark SQL basic example&quot;</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">config</span><span class="p">(</span><span class="s2">&quot;spark.some.config.option&quot;</span><span class="p">,</span> <span class="s2">&quot;some-value&quot;</span><span class="p">)</span> \
     <span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/sql/basic.py" in the Spark repo.</small></div>
@@ -323,7 +323,7 @@ While, in <a href="api/java/index.html?org/apache/spark/sql/Dataset.html">Java A
 
     <p>The entry point into all functionality in Spark is the <a href="api/R/sparkR.session.html"><code>SparkSession</code></a> class. To initialize a basic <code>SparkSession</code>, just call <code>sparkR.session()</code>:</p>
 
-    <div class="highlight"><pre>sparkR.session<span class="p">(</span>appName <span class="o">=</span> <span class="s">&quot;R Spark SQL basic example&quot;</span><span class="p">,</span> sparkConfig <span class="o">=</span> <span class="kt">list</span><span class="p">(</span>spark.some.config.option <span class="o">=</span> <span class="s">&quot;some-value&quot;</span><span class="p">))</span>
+    <div class="highlight"><pre><span></span>sparkR.session<span class="p">(</span>appName <span class="o">=</span> <span class="s">&quot;R Spark SQL basic example&quot;</span><span class="p">,</span> sparkConfig <span class="o">=</span> <span class="kt">list</span><span class="p">(</span>spark.some.config.option <span class="o">=</span> <span class="s">&quot;some-value&quot;</span><span class="p">))</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/r/RSparkSQLExample.R" in the Spark repo.</small></div>
 
@@ -344,7 +344,7 @@ from a Hive table, or from <a href="#data-sources">Spark data sources</a>.</p>
 
     <p>As an example, the following creates a DataFrame based on the content of a JSON file:</p>
 
-    <div class="highlight"><pre><span class="k">val</span> <span class="n">df</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">json</span><span class="o">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="o">)</span>
+    <div class="highlight"><pre><span></span><span class="k">val</span> <span class="n">df</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">json</span><span class="o">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="o">)</span>
 
 <span class="c1">// Displays the content of the DataFrame to stdout</span>
 <span class="n">df</span><span class="o">.</span><span class="n">show</span><span class="o">()</span>
@@ -365,7 +365,7 @@ from a Hive table, or from <a href="#data-sources">Spark data sources</a>.</p>
 
     <p>As an example, the following creates a DataFrame based on the content of a JSON file:</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.Row</span><span class="o">;</span>
 
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">read</span><span class="o">().</span><span class="na">json</span><span class="o">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="o">);</span>
@@ -389,17 +389,17 @@ from a Hive table, or from <a href="#data-sources">Spark data sources</a>.</p>
 
     <p>As an example, the following creates a DataFrame based on the content of a JSON file:</p>
 
-    <div class="highlight"><pre><span class="c"># spark is an existing SparkSession</span>
-<span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">json</span><span class="p">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="p">)</span>
-<span class="c"># Displays the content of the DataFrame to stdout</span>
+    <div class="highlight"><pre><span></span><span class="c1"># spark is an existing SparkSession</span>
+<span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">json</span><span class="p">(</span><span class="s2">&quot;examples/src/main/resources/people.json&quot;</span><span class="p">)</span>
+<span class="c1"># Displays the content of the DataFrame to stdout</span>
 <span class="n">df</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># | age|   name|</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># |null|Michael|</span>
-<span class="c"># |  30|   Andy|</span>
-<span class="c"># |  19| Justin|</span>
-<span class="c"># +----+-------+</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># | age|   name|</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># |null|Michael|</span>
+<span class="c1"># |  30|   Andy|</span>
+<span class="c1"># |  19| Justin|</span>
+<span class="c1"># +----+-------+</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/sql/basic.py" in the Spark repo.</small></div>
   </div>
@@ -410,7 +410,7 @@ from a Hive table, or from <a href="#data-sources">Spark data sources</a>.</p>
 
     <p>As an example, the following creates a DataFrame based on the content of a JSON file:</p>
 
-    <div class="highlight"><pre>df <span class="o">&lt;-</span> read.json<span class="p">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="p">)</span>
+    <div class="highlight"><pre><span></span>df <span class="o">&lt;-</span> read.json<span class="p">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="p">)</span>
 
 <span class="c1"># Displays the content of the DataFrame</span>
 <span class="kp">head</span><span class="p">(</span>df<span class="p">)</span>
@@ -444,7 +444,7 @@ showDF<span class="p">(</span>df<span class="p">)</span>
 
 <div class="codetabs">
 <div data-lang="scala">
-    <div class="highlight"><pre><span class="c1">// This import is needed to use the $-notation</span>
+    <div class="highlight"><pre><span></span><span class="c1">// This import is needed to use the $-notation</span>
 <span class="k">import</span> <span class="nn">spark.implicits._</span>
 <span class="c1">// Print the schema in a tree format</span>
 <span class="n">df</span><span class="o">.</span><span class="n">printSchema</span><span class="o">()</span>
@@ -499,8 +499,8 @@ showDF<span class="p">(</span>df<span class="p">)</span>
 
 <div data-lang="java">
 
-    <div class="highlight"><pre><span class="c1">// col(&quot;...&quot;) is preferable to df.col(&quot;...&quot;)</span>
-<span class="kn">import</span> <span class="nn">static</span> <span class="n">org</span><span class="o">.</span><span class="na">apache</span><span class="o">.</span><span class="na">spark</span><span class="o">.</span><span class="na">sql</span><span class="o">.</span><span class="na">functions</span><span class="o">.</span><span class="na">col</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="c1">// col(&quot;...&quot;) is preferable to df.col(&quot;...&quot;)</span>
+<span class="kn">import static</span> <span class="nn">org.apache.spark.sql.functions.col</span><span class="o">;</span>
 
 <span class="c1">// Print the schema in a tree format</span>
 <span class="n">df</span><span class="o">.</span><span class="na">printSchema</span><span class="o">();</span>
@@ -560,50 +560,50 @@ interactive data exploration, users are highly encouraged to use the
 latter form, which is future proof and won&#8217;t break with column names that
 are also attributes on the DataFrame class.</p>
 
-    <div class="highlight"><pre><span class="c"># spark, df are from the previous example</span>
-<span class="c"># Print the schema in a tree format</span>
+    <div class="highlight"><pre><span></span><span class="c1"># spark, df are from the previous example</span>
+<span class="c1"># Print the schema in a tree format</span>
 <span class="n">df</span><span class="o">.</span><span class="n">printSchema</span><span class="p">()</span>
-<span class="c"># root</span>
-<span class="c"># |-- age: long (nullable = true)</span>
-<span class="c"># |-- name: string (nullable = true)</span>
-
-<span class="c"># Select only the &quot;name&quot; column</span>
-<span class="n">df</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&quot;name&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
-<span class="c"># +-------+</span>
-<span class="c"># |   name|</span>
-<span class="c"># +-------+</span>
-<span class="c"># |Michael|</span>
-<span class="c"># |   Andy|</span>
-<span class="c"># | Justin|</span>
-<span class="c"># +-------+</span>
-
-<span class="c"># Select everybody, but increment the age by 1</span>
-<span class="n">df</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s">&#39;name&#39;</span><span class="p">],</span> <span class="n">df</span><span class="p">[</span><span class="s">&#39;age&#39;</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
-<span class="c"># +-------+---------+</span>
-<span class="c"># |   name|(age + 1)|</span>
-<span class="c"># +-------+---------+</span>
-<span class="c"># |Michael|     null|</span>
-<span class="c"># |   Andy|       31|</span>
-<span class="c"># | Justin|       20|</span>
-<span class="c"># +-------+---------+</span>
-
-<span class="c"># Select people older than 21</span>
-<span class="n">df</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s">&#39;age&#39;</span><span class="p">]</span> <span class="o">&gt;</span> <span class="mi">21</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
-<span class="c"># +---+----+</span>
-<span class="c"># |age|name|</span>
-<span class="c"># +---+----+</span>
-<span class="c"># | 30|Andy|</span>
-<span class="c"># +---+----+</span>
-
-<span class="c"># Count people by age</span>
-<span class="n">df</span><span class="o">.</span><span class="n">groupBy</span><span class="p">(</span><span class="s">&quot;age&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
-<span class="c"># +----+-----+</span>
-<span class="c"># | age|count|</span>
-<span class="c"># +----+-----+</span>
-<span class="c"># |  19|    1|</span>
-<span class="c"># |null|    1|</span>
-<span class="c"># |  30|    1|</span>
-<span class="c"># +----+-----+</span>
+<span class="c1"># root</span>
+<span class="c1"># |-- age: long (nullable = true)</span>
+<span class="c1"># |-- name: string (nullable = true)</span>
+
+<span class="c1"># Select only the &quot;name&quot; column</span>
+<span class="n">df</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">&quot;name&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
+<span class="c1"># +-------+</span>
+<span class="c1"># |   name|</span>
+<span class="c1"># +-------+</span>
+<span class="c1"># |Michael|</span>
+<span class="c1"># |   Andy|</span>
+<span class="c1"># | Justin|</span>
+<span class="c1"># +-------+</span>
+
+<span class="c1"># Select everybody, but increment the age by 1</span>
+<span class="n">df</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s1">&#39;name&#39;</span><span class="p">],</span> <span class="n">df</span><span class="p">[</span><span class="s1">&#39;age&#39;</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
+<span class="c1"># +-------+---------+</span>
+<span class="c1"># |   name|(age + 1)|</span>
+<span class="c1"># +-------+---------+</span>
+<span class="c1"># |Michael|     null|</span>
+<span class="c1"># |   Andy|       31|</span>
+<span class="c1"># | Justin|       20|</span>
+<span class="c1"># +-------+---------+</span>
+
+<span class="c1"># Select people older than 21</span>
+<span class="n">df</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s1">&#39;age&#39;</span><span class="p">]</span> <span class="o">&gt;</span> <span class="mi">21</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
+<span class="c1"># +---+----+</span>
+<span class="c1"># |age|name|</span>
+<span class="c1"># +---+----+</span>
+<span class="c1"># | 30|Andy|</span>
+<span class="c1"># +---+----+</span>
+
+<span class="c1"># Count people by age</span>
+<span class="n">df</span><span class="o">.</span><span class="n">groupBy</span><span class="p">(</span><span class="s2">&quot;age&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
+<span class="c1"># +----+-----+</span>
+<span class="c1"># | age|count|</span>
+<span class="c1"># +----+-----+</span>
+<span class="c1"># |  19|    1|</span>
+<span class="c1"># |null|    1|</span>
+<span class="c1"># |  30|    1|</span>
+<span class="c1"># +----+-----+</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/sql/basic.py" in the Spark repo.</small></div>
     <p>For a complete list of the types of operations that can be performed on a DataFrame refer to the <a href="api/python/pyspark.sql.html#pyspark.sql.DataFrame">API Documentation</a>.</p>
@@ -614,7 +614,7 @@ are also attributes on the DataFrame class.</p>
 
 <div data-lang="r">
 
-    <div class="highlight"><pre><span class="c1"># Create the DataFrame</span>
+    <div class="highlight"><pre><span></span><span class="c1"># Create the DataFrame</span>
 df <span class="o">&lt;-</span> read.json<span class="p">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="p">)</span>
 
 <span class="c1"># Show the content of the DataFrame</span>
@@ -673,7 +673,7 @@ printSchema<span class="p">(</span>df<span class="p">)</span>
 <div data-lang="scala">
     <p>The <code>sql</code> function on a <code>SparkSession</code> enables applications to run SQL queries programmatically and returns the result as a <code>DataFrame</code>.</p>
 
-    <div class="highlight"><pre><span class="c1">// Register the DataFrame as a SQL temporary view</span>
+    <div class="highlight"><pre><span></span><span class="c1">// Register the DataFrame as a SQL temporary view</span>
 <span class="n">df</span><span class="o">.</span><span class="n">createOrReplaceTempView</span><span class="o">(</span><span class="s">&quot;people&quot;</span><span class="o">)</span>
 
 <span class="k">val</span> <span class="n">sqlDF</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="o">(</span><span class="s">&quot;SELECT * FROM people&quot;</span><span class="o">)</span>
@@ -692,7 +692,7 @@ printSchema<span class="p">(</span>df<span class="p">)</span>
 <div data-lang="java">
     <p>The <code>sql</code> function on a <code>SparkSession</code> enables applications to run SQL queries programmatically and returns the result as a <code>Dataset&lt;Row&gt;</code>.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.Row</span><span class="o">;</span>
 
 <span class="c1">// Register the DataFrame as a SQL temporary view</span>
@@ -714,18 +714,18 @@ printSchema<span class="p">(</span>df<span class="p">)</span>
 <div data-lang="python">
     <p>The <code>sql</code> function on a <code>SparkSession</code> enables applications to run SQL queries programmatically and returns the result as a <code>DataFrame</code>.</p>
 
-    <div class="highlight"><pre><span class="c"># Register the DataFrame as a SQL temporary view</span>
-<span class="n">df</span><span class="o">.</span><span class="n">createOrReplaceTempView</span><span class="p">(</span><span class="s">&quot;people&quot;</span><span class="p">)</span>
+    <div class="highlight"><pre><span></span><span class="c1"># Register the DataFrame as a SQL temporary view</span>
+<span class="n">df</span><span class="o">.</span><span class="n">createOrReplaceTempView</span><span class="p">(</span><span class="s2">&quot;people&quot;</span><span class="p">)</span>
 
-<span class="n">sqlDF</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s">&quot;SELECT * FROM people&quot;</span><span class="p">)</span>
+<span class="n">sqlDF</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;SELECT * FROM people&quot;</span><span class="p">)</span>
 <span class="n">sqlDF</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># | age|   name|</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># |null|Michael|</span>
-<span class="c"># |  30|   Andy|</span>
-<span class="c"># |  19| Justin|</span>
-<span class="c"># +----+-------+</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># | age|   name|</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># |null|Michael|</span>
+<span class="c1"># |  30|   Andy|</span>
+<span class="c1"># |  19| Justin|</span>
+<span class="c1"># +----+-------+</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/sql/basic.py" in the Spark repo.</small></div>
   </div>
@@ -733,7 +733,7 @@ printSchema<span class="p">(</span>df<span class="p">)</span>
 <div data-lang="r">
     <p>The <code>sql</code> function enables applications to run SQL queries programmatically and returns the result as a <code>SparkDataFrame</code>.</p>
 
-    <div class="highlight"><pre>df <span class="o">&lt;-</span> sql<span class="p">(</span><span class="s">&quot;SELECT * FROM table&quot;</span><span class="p">)</span>
+    <div class="highlight"><pre><span></span>df <span class="o">&lt;-</span> sql<span class="p">(</span><span class="s">&quot;SELECT * FROM table&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/r/RSparkSQLExample.R" in the Spark repo.</small></div>
 
@@ -750,7 +750,7 @@ refer it, e.g. <code>SELECT * FROM global_temp.view1</code>.</p>
 
 <div class="codetabs">
 <div data-lang="scala">
-    <div class="highlight"><pre><span class="c1">// Register the DataFrame as a global temporary view</span>
+    <div class="highlight"><pre><span></span><span class="c1">// Register the DataFrame as a global temporary view</span>
 <span class="n">df</span><span class="o">.</span><span class="n">createGlobalTempView</span><span class="o">(</span><span class="s">&quot;people&quot;</span><span class="o">)</span>
 
 <span class="c1">// Global temporary view is tied to a system preserved database `global_temp`</span>
@@ -777,7 +777,7 @@ refer it, e.g. <code>SELECT * FROM global_temp.view1</code>.</p>
   </div>
 
 <div data-lang="java">
-    <div class="highlight"><pre><span class="c1">// Register the DataFrame as a global temporary view</span>
+    <div class="highlight"><pre><span></span><span class="c1">// Register the DataFrame as a global temporary view</span>
 <span class="n">df</span><span class="o">.</span><span class="na">createGlobalTempView</span><span class="o">(</span><span class="s">&quot;people&quot;</span><span class="o">);</span>
 
 <span class="c1">// Global temporary view is tied to a system preserved database `global_temp`</span>
@@ -804,37 +804,37 @@ refer it, e.g. <code>SELECT * FROM global_temp.view1</code>.</p>
   </div>
 
 <div data-lang="python">
-    <div class="highlight"><pre><span class="c"># Register the DataFrame as a global temporary view</span>
-<span class="n">df</span><span class="o">.</span><span class="n">createGlobalTempView</span><span class="p">(</span><span class="s">&quot;people&quot;</span><span class="p">)</span>
-
-<span class="c"># Global temporary view is tied to a system preserved database `global_temp`</span>
-<span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s">&quot;SELECT * FROM global_temp.people&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># | age|   name|</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># |null|Michael|</span>
-<span class="c"># |  30|   Andy|</span>
-<span class="c"># |  19| Justin|</span>
-<span class="c"># +----+-------+</span>
-
-<span class="c"># Global temporary view is cross-session</span>
-<span class="n">spark</span><span class="o">.</span><span class="n">newSession</span><span class="p">()</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s">&quot;SELECT * FROM global_temp.people&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># | age|   name|</span>
-<span class="c"># +----+-------+</span>
-<span class="c"># |null|Michael|</span>
-<span class="c"># |  30|   Andy|</span>
-<span class="c"># |  19| Justin|</span>
-<span class="c"># +----+-------+</span>
+    <div class="highlight"><pre><span></span><span class="c1"># Register the DataFrame as a global temporary view</span>
+<span class="n">df</span><span class="o">.</span><span class="n">createGlobalTempView</span><span class="p">(</span><span class="s2">&quot;people&quot;</span><span class="p">)</span>
+
+<span class="c1"># Global temporary view is tied to a system preserved database `global_temp`</span>
+<span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;SELECT * FROM global_temp.people&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># | age|   name|</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># |null|Michael|</span>
+<span class="c1"># |  30|   Andy|</span>
+<span class="c1"># |  19| Justin|</span>
+<span class="c1"># +----+-------+</span>
+
+<span class="c1"># Global temporary view is cross-session</span>
+<span class="n">spark</span><span class="o">.</span><span class="n">newSession</span><span class="p">()</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;SELECT * FROM global_temp.people&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># | age|   name|</span>
+<span class="c1"># +----+-------+</span>
+<span class="c1"># |null|Michael|</span>
+<span class="c1"># |  30|   Andy|</span>
+<span class="c1"># |  19| Justin|</span>
+<span class="c1"># +----+-------+</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/sql/basic.py" in the Spark repo.</small></div>
   </div>
 
 <div data-lang="sql">
 
-    <div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">CREATE</span> <span class="k">GLOBAL</span> <span class="k">TEMPORARY</span> <span class="k">VIEW</span> <span class="n">temp_view</span> <span class="k">AS</span> <span class="k">SELECT</span> <span class="n">a</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">b</span> <span class="o">*</span> <span class="mi">2</span> <span class="k">FROM</span> <span class="n">tbl</span>
+    <figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="k">CREATE</span> <span class="k">GLOBAL</span> <span class="k">TEMPORARY</span> <span class="k">VIEW</span> <span class="n">temp_view</span> <span class="k">AS</span> <span class="k">SELECT</span> <span class="n">a</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">b</span> <span class="o">*</span> <span class="mi">2</span> <span class="k">FROM</span> <span class="n">tbl</span>
 
-<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">global_temp</span><span class="p">.</span><span class="n">temp_view</span></code></pre></div>
+<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">global_temp</span><span class="p">.</span><span class="n">temp_view</span></code></pre></figure>
 
   </div>
 </div>
@@ -850,7 +850,7 @@ the bytes back into an object.</p>
 
 <div class="codetabs">
 <div data-lang="scala">
-    <div class="highlight"><pre><span class="c1">// Note: Case classes in Scala 2.10 can support only up to 22 fields. To work around this limit,</span>
+    <div class="highlight"><pre><span></span><span class="c1">// Note: Case classes in Scala 2.10 can support only up to 22 fields. To work around this limit,</span>
 <span class="c1">// you can use custom classes that implement the Product interface</span>
 <span class="k">case</span> <span class="k">class</span> <span class="nc">Person</span><span class="o">(</span><span class="n">name</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">age</span><span class="k">:</span> <span class="kt">Long</span><span class="o">)</span>
 
@@ -883,7 +883,7 @@ the bytes back into an object.</p>
   </div>
 
 <div data-lang="java">
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.Collections</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.io.Serializable</span><span class="o">;</span>
 
@@ -915,7 +915,7 @@ the bytes back into an object.</p>
 <span class="o">}</span>
 
 <span class="c1">// Create an instance of a Bean class</span>
-<span class="n">Person</span> <span class="n">person</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Person</span><span class="o">();</span>
+<span class="n">Person</span> <span class="n">person</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Person</span><span class="o">();</span>
 <span class="n">person</span><span class="o">.</span><span class="na">setName</span><span class="o">(</span><span class="s">&quot;Andy&quot;</span><span class="o">);</span>
 <span class="n">person</span><span class="o">.</span><span class="na">setAge</span><span class="o">(</span><span class="mi">32</span><span class="o">);</span>
 
@@ -982,7 +982,7 @@ reflection and become the names of the columns. Case classes can also be nested
 types such as <code>Seq</code>s or <code>Array</code>s. This RDD can be implicitly converted to a DataFrame and then be
 registered as a table. Tables can be used in subsequent SQL statements.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.sql.catalyst.encoders.ExpressionEncoder</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.sql.catalyst.encoders.ExpressionEncoder</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.sql.Encoder</span>
 
 <span class="c1">// For implicit conversions from RDDs to DataFrames</span>
@@ -1037,7 +1037,7 @@ does not support JavaBeans that contain <code>Map</code> field(s). Nested JavaBe
 fields are supported though. You can create a JavaBean by creating a class that implements
 Serializable and has getters and setters for all of its fields.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.MapFunction</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
@@ -1053,7 +1053,7 @@ Serializable and has getters and setters for all of its fields.</p>
     <span class="nd">@Override</span>
     <span class="kd">public</span> <span class="n">Person</span> <span class="nf">call</span><span class="o">(</span><span class="n">String</span> <span class="n">line</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">Exception</span> <span class="o">{</span>
       <span class="n">String</span><span class="o">[]</span> <span class="n">parts</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="na">split</span><span class="o">(</span><span class="s">&quot;,&quot;</span><span class="o">);</span>
-      <span class="n">Person</span> <span class="n">person</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Person</span><span class="o">();</span>
+      <span class="n">Person</span> <span class="n">person</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Person</span><span class="o">();</span>
       <span class="n">person</span><span class="o">.</span><span class="na">setName</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="mi">0</span><span class="o">]);</span>
       <span class="n">person</span><span class="o">.</span><span class="na">setAge</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="mi">1</span><span class="o">].</span><span class="na">trim</span><span class="o">()));</span>
       <span class="k">return</span> <span class="n">person</span><span class="o">;</span>
@@ -1106,28 +1106,28 @@ Serializable and has getters and setters for all of its fields.</p>
 key/value pairs as kwargs to the Row class. The keys of this list define the column names of the table,
 and the types are inferred by sampling the whole dataset, similar to the inference that is performed on JSON files.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">Row</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">Row</span>
 
 <span class="n">sc</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sparkContext</span>
 
-<span class="c"># Load a text file and convert each line to a Row.</span>
-<span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;examples/src/main/resources/people.txt&quot;</span><span class="p">)</span>
-<span class="n">parts</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">l</span><span class="p">:</span> <span class="n">l</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot;,&quot;</span><span class="p">))</span>
+<span class="c1"># Load a text file and convert each line to a Row.</span>
+<span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;examples/src/main/resources/people.txt&quot;</span><span class="p">)</span>
+<span class="n">parts</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">l</span><span class="p">:</span> <span class="n">l</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot;,&quot;</span><span class="p">))</span>
 <span class="n">people</span> <span class="o">=</span> <span class="n">parts</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="n">Row</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="n">p</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">age</span><span class="o">=</span><span class="nb">int</span><span class="p">(</span><span class="n">p</span><span class="p">[</span><span class="mi">1</span><span class="p">])))</span>
 
-<span class="c"># Infer the schema, and register the DataFrame as a table.</span>
+<span class="c1"># Infer the schema, and register the DataFrame as a table.</span>
 <span class="n">schemaPeople</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">people</span><span class="p">)</span>
-<span class="n">schemaPeople</span><span class="o">.</span><span class="n">createOrReplaceTempView</span><span class="p">(</span><span class="s">&quot;people&quot;</span><span class="p">)</span>
+<span class="n">schemaPeople</span><span class="o">.</span><span class="n">createOrReplaceTempView</span><span class="p">(</span><span class="s2">&quot;people&quot;</span><span class="p">)</span>
 
-<span class="c"># SQL can be run over DataFrames that have been registered as a table.</span>
-<span class="n">teenagers</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s">&quot;SELECT name FROM people WHERE age &gt;= 13 AND age &lt;= 19&quot;</span><span class="p">)</span>
+<span class="c1"># SQL can be run over DataFrames that have been registered as a table.</span>
+<span class="n">teenagers</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;SELECT name FROM people WHERE age &gt;= 13 AND age &lt;= 19&quot;</span><span class="p">)</span>
 
-<span class="c"># The results of SQL queries are Dataframe objects.</span>
-<span class="c"># rdd returns the content as an :class:`pyspark.RDD` of :class:`Row`.</span>
-<span class="n">teenNames</span> <span class="o">=</span> <span class="n">teenagers</span><span class="o">.</span><span class="n">rdd</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="s">&quot;Name: &quot;</span> <span class="o">+</span> <span class="n">p</span><span class="o">.</span><span class="n">name</span><span class="p">)</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span>
+<span class="c1"># The results of SQL queries are Dataframe objects.</span>
+<span class="c1"># rdd returns the content as an :class:`pyspark.RDD` of :class:`Row`.</span>
+<span class="n">teenNames</span> <span class="o">=</span> <span class="n">teenagers</span><span class="o">.</span><span class="n">rdd</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="s2">&quot;Name: &quot;</span> <span class="o">+</span> <span class="n">p</span><span class="o">.</span><span class="n">name</span><span class="p">)</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span>
 <span class="k">for</span> <span class="n">name</span> <span class="ow">in</span> <span class="n">teenNames</span><span class="p">:</span>
     <span class="k">print</span><span class="p">(</span><span class="n">name</span><span class="p">)</span>
-<span class="c"># Name: Justin</span>
+<span class="c1"># Name: Justin</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/sql/basic.py" in the Spark repo.</small></div>
   </div>
@@ -1155,7 +1155,7 @@ by <code>SparkSession</code>.</li>
 
     <p>For example:</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.sql.types._</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.sql.types._</span>
 
 <span class="c1">// Create an RDD</span>
 <span class="k">val</span> <span class="n">peopleRDD</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sparkContext</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;examples/src/main/resources/people.txt&quot;</span><span class="o">)</span>
@@ -1213,7 +1213,7 @@ by <code>SparkSession</code>.</li>
 
     <p>For example:</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.ArrayList</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.ArrayList</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.List</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
@@ -1296,43 +1296,43 @@ tuples or lists in the RDD created in the step 1.</li>
 
     <p>For example:</p>
 
-    <div class="highlight"><pre><span class="c"># Import data types</span>
+    <div class="highlight"><pre><span></span><span class="c1"># Import data types</span>
 <span class="kn">from</span> <span class="nn">pyspark.sql.types</span> <span class="kn">import</span> <span class="o">*</span>
 
 <span class="n">sc</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sparkContext</span>
 
-<span class="c"># Load a text file and convert each line to a Row.</span>
-<span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;examples/src/main/resources/people.txt&quot;</span><span class="p">)</span>
-<span class="n">parts</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">l</span><span class="p">:</span> <span class="n">l</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot;,&quot;</span><span class="p">))</span>
-<span class="c"># Each line is converted to a tuple.</span>
+<span class="c1"># Load a text file and convert each line to a Row.</span>
+<span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;examples/src/main/resources/people.txt&quot;</span><span class="p">)</span>
+<span class="n">parts</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">l</span><span class="p">:</span> <span class="n">l</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot;,&quot;</span><span class="p">))</span>
+<span class="c1"># Each line is converted to a tuple.</span>
 <span class="n">people</span> <span class="o">=</span> <span class="n">parts</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="p">(</span><span class="n">p</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">p</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">strip</span><span class="p">()))</span>
 
-<span class="c"># The schema is encoded in a string.</span>
-<span class="n">schemaString</span> <span class="o">=</span> <span class="s">&quot;name age&quot;</span>
+<span class="c1"># The schema is encoded in a string.</span>
+<span class="n">schemaString</span> <span class="o">=</span> <span class="s2">&quot;name age&quot;</span>
 
 <span class="n">fields</span> <span class="o">=</span> <span class="p">[</span><span class="n">StructField</span><span class="p">(</span><span class="n">field_name</span><span class="p">,</span> <span class="n">StringType</span><span class="p">(),</span> <span class="bp">True</span><span class="p">)</span> <span class="k">for</span> <span class="n">field_name</span> <span class="ow">in</span> <span class="n">schemaString</span><span class="o">.</span><span class="n">split</span><span class="p">()]</span>
 <span class="n">schema</span> <span class="o">=</span> <span class="n">StructType</span><span class="p">(</span><span class="n">fields</span><span class="p">)</span>
 
-<span class="c"># Apply the schema to the RDD.</span>
+<span class="c1"># Apply the schema to the RDD.</span>
 <span class="n">schemaPeople</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">people</span><span class="p">,</span> <span class="n">schema</span><span class="p">)</span>
 
-<span class="c"># Creates a temporary view using the DataFrame</span>
-<span class="n">schemaPeople</span><span class="o">.</span><span class="n">createOrReplaceTempView</span><span class="p">(</span><span class="s">&quot;people&quot;</span><span class="p">)</span>
+<span class="c1"># Creates a temporary view using the DataFrame</span>
+<span class="n">schemaPeople</span><span class="o">.</span><span class="n">createOrReplaceTempView</span><span class="p">(</span><span class="s2">&quot;people&quot;</span><span class="p">)</span>
 
-<span class="c"># Creates a temporary view using the DataFrame</span>
-<span class="n">schemaPeople</span><span class="o">.</span><span class="n">createOrReplaceTempView</span><span class="p">(</span><span class="s">&quot;people&quot;</span><span class="p">)</span>
+<span class="c1"># Creates a temporary view using the DataFrame</span>
+<span class="n">schemaPeople</span><span class="o">.</span><span class="n">createOrReplaceTempView</span><span class="p">(</span><span class="s2">&quot;people&quot;</span><span class="p">)</span>
 
-<span class="c"># SQL can be run over DataFrames that have been registered as a table.</span>
-<span class="n">results</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s">&quot;SELECT name FROM people&quot;</span><span class="p">)</span>
+<span class="c1"># SQL can be run over DataFrames that have been registered as a table.</span>
+<span class="n">results</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;SELECT name FROM people&quot;</span><span class="p">)</span>
 
 <span class="n">results</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
-<span class="c"># +-------+</span>
-<span class="c"># |   name|</span>
-<span class="c"># +-------+</span>
-<span class="c"># |Michael|</span>
-<span class="c"># |   Andy|</span>
-<span class="c"># | Justin|</span>
-<span class="c"># +-------+</span>
+<span class="c1"># +-------+</span>
+<span class="c1"># |   name|</span>
+<span class="c1"># +-------+</span>
+<span class="c1"># |Michael|</span>
+<span class="c1"># |   Andy|</span>
+<span class="c1"># | Justin|</span>
+<span class="c1"># +-------+</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/sql/basic.py" in the Spark repo.</small></div>
   </div>
@@ -1354,14 +1354,14 @@ goes into specific options that are available for the built-in data sources.</p>
 
 <div class="codetabs">
 <div data-lang="scala">
-    <div class="highlight"><pre><span class="k">val</span> <span class="n">usersDF</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">load</span><span class="o">(</span><span class="s">&quot;examples/src/main/resources/users.parquet&quot;</span><span class="o">)</span>
+    <div class="highlight"><pre><span></span><span class="k">val</span> <span class="n">usersDF</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">load</span><span class="o">(</span><span class="s">&quot;examples/src/main/resources/users.parquet&quot;</span><span class="o">)</span>
 <span class="n">usersDF</span><span class="o">.</span><span class="n">select</span><span class="o">(</span><span class="s">&quot;name&quot;</span><span class="o">,</span> <span class="s">&quot;favorite_color&quot;</span><span class="o">).</span><span class="n">write</span><span class="o">.</span><span class="n">save</span><span class="o">(</span><span class="s">&quot;namesAndFavColors.parquet&quot;</span><span class="o">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala" in the Spark repo.</small></div>
   </div>
 
 <div data-lang="java">
-    <div class="highlight"><pre><span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">usersDF</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">read</span><span class="o">().</span><span class="na">load</span><span class="o">(</span><span class="s">&quot;examples/src/main/resources/users.parquet&quot;</span><span class="o">);</span>
+    <div class="highlight"><pre><span></span><span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">usersDF</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">read</span><span class="o">().</span><span class="na">load</span><span class="o">(</span><span class="s">&quot;examples/src/main/resources/users.parquet&quot;</span><span class="o">);</span>
 <span class="n">usersDF</span><span class="o">.</span><span class="na">select</span><span class="o">(</span><span class="s">&quot;name&quot;</span><span class="o">,</span> <span class="s">&quot;favorite_color&quot;</span><span class="o">).</span><span class="na">write</span><span class="o">().</span><span class="na">save</span><span class="o">(</span><span class="s">&quot;namesAndFavColors.parquet&quot;</span><span class="o">);</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java" in the Spark repo.</small></div>
@@ -1369,15 +1369,15 @@ goes into specific options that are available for the built-in data sources.</p>
 
 <div data-lang="python">
 
-    <div class="highlight"><pre><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">&quot;examples/src/main/resources/users.parquet&quot;</span><span class="p">)</span>
-<span class="n">df</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&quot;name&quot;</span><span class="p">,</span> <span class="s">&quot;favorite_color&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s">&quot;namesAndFavColors.parquet&quot;</span><span class="p">)</span>
+    <div class="highlight"><pre><span></span><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">&quot;examples/src/main/resources/users.parquet&quot;</span><span class="p">)</span>
+<span class="n">df</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">&quot;name&quot;</span><span class="p">,</span> <span class="s2">&quot;favorite_color&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">&quot;namesAndFavColors.parquet&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/sql/datasource.py" in the Spark repo.</small></div>
   </div>
 
 <div data-lang="r">
 
-    <div class="highlight"><pre>df <span class="o">&lt;-</span> read.df<span class="p">(</span><span class="s">&quot;examples/src/main/resources/users.parquet&quot;</span><span class="p">)</span>
+    <div class="highlight"><pre><span></span>df <span class="o">&lt;-</span> read.df<span class="p">(</span><span class="s">&quot;examples/src/main/resources/users.parquet&quot;</span><span class="p">)</span>
 write.df<span class="p">(</span>select<span class="p">(</span>df<span class="p">,</span> <span class="s">&quot;name&quot;</span><span class="p">,</span> <span class="s">&quot;favorite_color&quot;</span><span class="p">),</span> <span class="s">&quot;namesAndFavColors.parquet&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/r/RSparkSQLExample.R" in the Spark repo.</small></div>
@@ -1395,14 +1395,14 @@ source type can be converted into other types using this syntax.</p>
 
 <div class="codetabs">
 <div data-lang="scala">
-    <div class="highlight"><pre><span class="k">val</span> <span class="n">peopleDF</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="o">(</span><span class="s">&quot;json&quot;</span><span class="o">).</span><span class="n">load</span><span class="o">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="o">)</span>
+    <div class="highlight"><pre><span></span><span class="k">val</span> <span class="n">peopleDF</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="o">(</span><span class="s">&quot;json&quot;</span><span class="o">).</span><span class="n">load</span><span class="o">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="o">)</span>
 <span class="n">peopleDF</span><span class="o">.</span><span class="n">select</span><span class="o">(</span><span class="s">&quot;name&quot;</span><span class="o">,</span> <span class="s">&quot;age&quot;</span><span class="o">).</span><span class="n">write</span><span class="o">.</span><span class="n">format</span><span class="o">(</span><span class="s">&quot;parquet&quot;</span><span class="o">).</span><span class="n">save</span><span class="o">(</span><span class="s">&quot;namesAndAges.parquet&quot;</span><span class="o">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala" in the Spark repo.</small></div>
   </div>
 
 <div data-lang="java">
-    <div class="highlight"><pre><span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">peopleDF</span> <span class="o">=</span>
+    <div class="highlight"><pre><span></span><span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">peopleDF</span> <span class="o">=</span>
   <span class="n">spark</span><span class="o">.</span><span class="na">read</span><span class="o">().</span><span class="na">format</span><span class="o">(</span><span class="s">&quot;json&quot;</span><span class="o">).</span><span class="na">load</span><span class="o">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="o">);</span>
 <span class="n">peopleDF</span><span class="o">.</span><span class="na">select</span><span class="o">(</span><span class="s">&quot;name&quot;</span><span class="o">,</span> <span class="s">&quot;age&quot;</span><span class="o">).</span><span class="na">write</span><span class="o">().</span><span class="na">format</span><span class="o">(</span><span class="s">&quot;parquet&quot;</span><span class="o">).</span><span class="na">save</span><span class="o">(</span><span class="s">&quot;namesAndAges.parquet&quot;</span><span class="o">);</span>
 </pre></div>
@@ -1410,14 +1410,14 @@ source type can be converted into other types using this syntax.</p>
   </div>
 
 <div data-lang="python">
-    <div class="highlight"><pre><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="p">,</span> <span class="n">format</span><span class="o">=</span><span class="s">&quot;json&quot;</span><span class="p">)</span>
-<span class="n">df</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&quot;name&quot;</span><span class="p">,</span> <span class="s">&quot;age&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s">&quot;namesAndAges.parquet&quot;</span><span class="p">,</span> <span class="n">format</span><span class="o">=</span><span class="s">&quot;parquet&quot;</span><span class="p">)</span>
+    <div class="highlight"><pre><span></span><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">&quot;examples/src/main/resources/people.json&quot;</span><span class="p">,</span> <span class="n">format</span><span class="o">=</span><span class="s2">&quot;json&quot;</span><span class="p">)</span>
+<span class="n">df</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">&quot;name&quot;</span><span class="p">,</span> <span class="s2">&quot;age&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="s2">&quot;namesAndAges.parquet&quot;</span><span class="p">,</span> <span class="n">format</span><span class="o">=</span><span class="s2">&quot;parquet&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/sql/datasource.py" in the Spark repo.</small></div>
   </div>
 
 <div data-lang="r">
-    <div class="highlight"><pre>df <span class="o">&lt;-</span> read.df<span class="p">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="p">,</span> <span class="s">&quot;json&quot;</span><span class="p">)</span>
+    <div class="highlight"><pre><span></span>df <span class="o">&lt;-</span> read.df<span class="p">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="p">,</span> <span class="s">&quot;json&quot;</span><span class="p">)</span>
 namesAndAges <span class="o">&lt;-</span> select<span class="p">(</span>df<span class="p">,</span> <span class="s">&quot;name&quot;</span><span class="p">,</span> <span class="s">&quot;age&quot;</span><span class="p">)</span>
 write.df<span class="p">(</span>namesAndAges<span class="p">,</span> <span class="s">&quot;namesAndAges.parquet&quot;</span><span class="p">,</span> <span class="s">&quot;parquet&quot;</span><span class="p">)</span>
 </pre></div>
@@ -1432,26 +1432,26 @@ file directly with SQL.</p>
 
 <div class="codetabs">
 <div data-lang="scala">
-    <div class="highlight"><pre><span class="k">val</span> <span class="n">sqlDF</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="o">(</span><span class="s">&quot;SELECT * FROM parquet.`examples/src/main/resources/users.parquet`&quot;</span><span class="o">)</span>
+    <div class="highlight"><pre><span></span><span class="k">val</span> <span class="n">sqlDF</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="o">(</span><span class="s">&quot;SELECT * FROM parquet.`examples/src/main/resources/users.parquet`&quot;</span><span class="o">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala" in the Spark repo.</small></div>
   </div>
 
 <div data-lang="java">
-    <div class="highlight"><pre><span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">sqlDF</span> <span class="o">=</span>
+    <div class="highlight"><pre><span></span><span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">sqlDF</span> <span class="o">=</span>
   <span class="n">spark</span><span class="o">.</span><span class="na">sql</span><span class="o">(</span><span class="s">&quot;SELECT * FROM parquet.`examples/src/main/resources/users.parquet`&quot;</span><span class="o">);</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java" in the Spark repo.</small></div>
   </div>
 
 <div data-lang="python">
-    <div class="highlight"><pre><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s">&quot;SELECT * FROM parquet.`examples/src/main/resources/users.parquet`&quot;</span><span class="p">)</span>
+    <div class="highlight"><pre><span></span><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;SELECT * FROM parquet.`examples/src/main/resources/users.parquet`&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/sql/datasource.py" in the Spark repo.</small></div>
   </div>
 
 <div data-lang="r">
-    <div class="highlight"><pre>df <span class="o">&lt;-</span> sql<span class="p">(</span><span class="s">&quot;SELECT * FROM parquet.`examples/src/main/resources/users.parquet`&quot;</span><span class="p">)</span>
+    <div class="highlight"><pre><span></span>df <span class="o">&lt;-</span> sql<span class="p">(</span><span class="s">&quot;SELECT * FROM parquet.`examples/src/main/resources/users.parquet`&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/r/RSparkSQLExample.R" in the Spark repo.</small></div>
 
@@ -1531,7 +1531,7 @@ compatibility reasons.</p>
 <div class="codetabs">
 
 <div data-lang="scala">
-    <div class="highlight"><pre><span class="c1">// Encoders for most common types are automatically provided by importing spark.implicits._</span>
+    <div class="highlight"><pre><span></span><span class="c1">// Encoders for most common types are automatically provided by importing spark.implicits._</span>
 <span class="k">import</span> <span class="nn">spark.implicits._</span>
 
 <span class="k">val</span> <span class="n">peopleDF</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">json</span><span class="o">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="o">)</span>
@@ -1558,7 +1558,7 @@ compatibility reasons.</p>
   </div>
 
 <div data-lang="java">
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaSparkContext</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.MapFunction</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.Encoders</span><span class="o">;</span>
@@ -1595,32 +1595,32 @@ compatibility reasons.</p>
 
 <div data-lang="python">
 
-    <div class="highlight"><pre><span class="n">peopleDF</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">json</span><span class="p">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="p">)</span>
+    <div class="highlight"><pre><span></span><span class="n">peopleDF</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">json</span><span class="p">(</span><span class="s2">&quot;examples/src/main/resources/people.json&quot;</span><span class="p">)</span>
 
-<span class="c"># DataFrames can be saved as Parquet files, maintaining the schema information.</span>
-<span class="n">peopleDF</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">parquet</span><span class="p">(</span><span class="s">&quot;people.parquet&quot;</span><span class="p">)</span>
+<span class="c1"># DataFrames can be saved as Parquet files, maintaining the schema information.</span>
+<span class="n">peopleDF</span><span class="o">.</span><span class="n">write</span><span class="o">.</span><span class="n">parquet</span><span class="p">(</span><span class="s2">&quot;people.parquet&quot;</span><span class="p">)</span>
 
-<span class="c"># Read in the Parquet file created above.</span>
-<span class="c"># Parquet files are self-describing so the schema is preserved.</span>
-<span class="c"># The result of loading a parquet file is also a DataFrame.</span>
-<span class="n">parquetFile</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">parquet</span><span class="p">(</span><span class="s">&quot;people.parquet&quot;</span><span class="p">)</span>
+<span class="c1"># Read in the Parquet file created above.</span>
+<span class="c1"># Parquet files are self-describing so the schema is preserved.</span>
+<span class="c1"># The result of loading a parquet file is also a DataFrame.</span>
+<span class="n">parquetFile</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">parquet</span><span class="p">(</span><span class="s2">&quot;people.parquet&quot;</span><span class="p">)</span>
 
-<span class="c"># Parquet files can also be used to create a temporary view and then used in SQL statements.</span>
-<span class="n">parquetFile</span><span class="o">.</span><span class="n">createOrReplaceTempView</span><span class="p">(</span><span class="s">&quot;parquetFile&quot;</span><span class="p">)</span>
-<span class="n">teenagers</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s">&quot;SELECT name FROM parquetFile WHERE age &gt;= 13 AND age &lt;= 19&quot;</span><span class="p">)</span>
+<span class="c1"># Parquet files can also be used to create a temporary view and then used in SQL statements.</span>
+<span class="n">parquetFile</span><span class="o">.</span><span class="n">createOrReplaceTempView</span><span class="p">(</span><span class="s2">&quot;parquetFile&quot;</span><span class="p">)</span>
+<span class="n">teenagers</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;SELECT name FROM parquetFile WHERE age &gt;= 13 AND age &lt;= 19&quot;</span><span class="p">)</span>
 <span class="n">teenagers</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
-<span class="c"># +------+</span>
-<span class="c"># |  name|</span>
-<span class="c"># +------+</span>
-<span class="c"># |Justin|</span>
-<span class="c"># +------+</span>
+<span class="c1"># +------+</span>
+<span class="c1"># |  name|</span>
+<span class="c1"># +------+</span>
+<span class="c1"># |Justin|</span>
+<span class="c1"># +------+</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/sql/datasource.py" in the Spark repo.</small></div>
   </div>
 
 <div data-lang="r">
 
-    <div class="highlight"><pre>df <span class="o">&lt;-</span> read.df<span class="p">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="p">,</span> <span class="s">&quot;json&quot;</span><span class="p">)</span>
+    <div class="highlight"><pre><span></span>df <span class="o">&lt;-</span> read.df<span class="p">(</span><span class="s">&quot;examples/src/main/resources/people.json&quot;</span><span class="p">,</span> <span class="s">&quot;json&quot;</span><span class="p">)</span>
 
 <span class="c1"># SparkDataFrame can be saved as Parquet files, maintaining the schema information.</span>
 write.parquet<span class="p">(</span>df<span class="p">,</span> <span class="s">&quot;people.parquet&quot;</span><span class="p">)</span>
@@ -1652,13 +1652,13 @@ teenNames <span class="o">&lt;-</span> dapply<span class="p">(</span>df<span cla
 
 <div data-lang="sql">
 
-    <div class="highlight"><pre><code class="language-sql" data-lang="sql"><span class="k">CREATE</span> <span class="k">TEMPORARY</span> <span class="k">VIEW</span> <span class="n">parquetTable</span>
+    <figure class="highlight"><pre><code class="language-sql" data-lang="sql"><span></span><span class="k">CREATE</span> <span class="k">TEMPORARY</span> <span class="k">VIEW</span> <span class="n">parquetTable</span>
 <span class="k">USING</span> <span class="n">org</span><span class="p">.</span><span class="n">apache</span><span class="p">.</span><span class="n">spark</span><span class="p">.</span><span class="k">sql</span><span class="p">.</span><span class="n">parquet</span>
 <span class="k">OPTIONS</span> <span class="p">(</span>
   <span class="n">path</span> <span class="ss">&quot;examples/src/main/resources/people.parquet&quot;</span>
 <span class="p">)</span>
 
-<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">parquetTable</span></code></pre></div>
+<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">parquetTable</span></code></pre></figure>
 
   </div>
 
@@ -1673,7 +1673,7 @@ partitioning information automatically. For example, we can store all our previo
 population data into a partitioned table using the following directory structure, with two extra
 columns, <code>gender</code> and <code>country</code> as partitioning columns:</p>
 
-<div class="highlight"><pre><code class="language-text" data-lang="text">path
+<figure class="highlight"><pre><code class="language-text" data-lang="text"><span></span>path
 \u2514\u2500\u2500 to
     \u2514\u2500\u2500 table
         \u251c\u2500\u2500 gender=male
@@ -1691,17 +1691,17 @@ columns, <code>gender</code> and <code>country</code> as partitioning columns:</
          �� \u2502�� \u2514\u2500\u2500 data.parquet
          �� \u251c\u2500\u2500 country=CN
          �� \u2502�� \u2514\u2500\u2500 data.parquet
-         �� \u2514\u2500\u2500 ...</code></pre></div>
+         �� \u2514\u2500\u2500 ...</code></pre></figure>
 
 <p>By passing <code>path/to/table</code> to either <code>SparkSession.read.parquet</code> or <code>SparkSession.read.load</code>, Spark SQL
 will automatically extract the partitioning information from the paths.
 Now the schema of the returned DataFrame becomes:</p>
 
-<div class="highlight"><pre><code class="language-text" data-lang="text">root
+<figure class="highlight"><pre><code class="language-text" data-lang="text"><span></span>root
 |-- name: string (nullable = true)
 |-- age: long (nullable = true)
 |-- gender: strin

<TRUNCATED>

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[14/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-decision-tree.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-decision-tree.html b/site/docs/2.1.0/mllib-decision-tree.html
index 1a3d865..991610e 100644
--- a/site/docs/2.1.0/mllib-decision-tree.html
+++ b/site/docs/2.1.0/mllib-decision-tree.html
@@ -307,23 +307,23 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#basic-algorithm" id="markdown-toc-basic-algorithm">Basic algorithm</a>    <ul>
-      <li><a href="#node-impurity-and-information-gain" id="markdown-toc-node-impurity-and-information-gain">Node impurity and information gain</a></li>
-      <li><a href="#split-candidates" id="markdown-toc-split-candidates">Split candidates</a></li>
-      <li><a href="#stopping-rule" id="markdown-toc-stopping-rule">Stopping rule</a></li>
+  <li><a href="#basic-algorithm">Basic algorithm</a>    <ul>
+      <li><a href="#node-impurity-and-information-gain">Node impurity and information gain</a></li>
+      <li><a href="#split-candidates">Split candidates</a></li>
+      <li><a href="#stopping-rule">Stopping rule</a></li>
     </ul>
   </li>
-  <li><a href="#usage-tips" id="markdown-toc-usage-tips">Usage tips</a>    <ul>
-      <li><a href="#problem-specification-parameters" id="markdown-toc-problem-specification-parameters">Problem specification parameters</a></li>
-      <li><a href="#stopping-criteria" id="markdown-toc-stopping-criteria">Stopping criteria</a></li>
-      <li><a href="#tunable-parameters" id="markdown-toc-tunable-parameters">Tunable parameters</a></li>
-      <li><a href="#caching-and-checkpointing" id="markdown-toc-caching-and-checkpointing">Caching and checkpointing</a></li>
+  <li><a href="#usage-tips">Usage tips</a>    <ul>
+      <li><a href="#problem-specification-parameters">Problem specification parameters</a></li>
+      <li><a href="#stopping-criteria">Stopping criteria</a></li>
+      <li><a href="#tunable-parameters">Tunable parameters</a></li>
+      <li><a href="#caching-and-checkpointing">Caching and checkpointing</a></li>
     </ul>
   </li>
-  <li><a href="#scaling" id="markdown-toc-scaling">Scaling</a></li>
-  <li><a href="#examples" id="markdown-toc-examples">Examples</a>    <ul>
-      <li><a href="#classification" id="markdown-toc-classification">Classification</a></li>
-      <li><a href="#regression" id="markdown-toc-regression">Regression</a></li>
+  <li><a href="#scaling">Scaling</a></li>
+  <li><a href="#examples">Examples</a>    <ul>
+      <li><a href="#classification">Classification</a></li>
+      <li><a href="#regression">Regression</a></li>
     </ul>
   </li>
 </ul>
@@ -548,7 +548,7 @@ maximum tree depth of 5. The test error is calculated to measure the algorithm a
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree"><code>DecisionTree</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.tree.model.DecisionTreeModel"><code>DecisionTreeModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.DecisionTree</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.DecisionTree</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.model.DecisionTreeModel</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
 
@@ -588,7 +588,7 @@ maximum tree depth of 5. The test error is calculated to measure the algorithm a
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/tree/DecisionTree.html"><code>DecisionTree</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/tree/model/DecisionTreeModel.html"><code>DecisionTreeModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.HashMap</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.HashMap</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.Map</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
@@ -604,8 +604,8 @@ maximum tree depth of 5. The test error is calculated to measure the algorithm a
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.tree.model.DecisionTreeModel</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span><span class="o">;</span>
 
-<span class="n">SparkConf</span> <span class="n">sparkConf</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;JavaDecisionTreeClassificationExample&quot;</span><span class="o">);</span>
-<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JavaSparkContext</span><span class="o">(</span><span class="n">sparkConf</span><span class="o">);</span>
+<span class="n">SparkConf</span> <span class="n">sparkConf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;JavaDecisionTreeClassificationExample&quot;</span><span class="o">);</span>
+<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaSparkContext</span><span class="o">(</span><span class="n">sparkConf</span><span class="o">);</span>
 
 <span class="c1">// Load and parse the data file.</span>
 <span class="n">String</span> <span class="n">datapath</span> <span class="o">=</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="o">;</span>
@@ -657,30 +657,30 @@ maximum tree depth of 5. The test error is calculated to measure the algorithm a
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.tree.DecisionTree"><code>DecisionTree</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.tree.DecisionTreeModel"><code>DecisionTreeModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.tree</span> <span class="kn">import</span> <span class="n">DecisionTree</span><span class="p">,</span> <span class="n">DecisionTreeModel</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.tree</span> <span class="kn">import</span> <span class="n">DecisionTree</span><span class="p">,</span> <span class="n">DecisionTreeModel</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span>
 
-<span class="c"># Load and parse the data file into an RDD of LabeledPoint.</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&#39;data/mllib/sample_libsvm_data.txt&#39;</span><span class="p">)</span>
-<span class="c"># Split the data into training and test sets (30% held out for testing)</span>
+<span class="c1"># Load and parse the data file into an RDD of LabeledPoint.</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s1">&#39;data/mllib/sample_libsvm_data.txt&#39;</span><span class="p">)</span>
+<span class="c1"># Split the data into training and test sets (30% held out for testing)</span>
 <span class="p">(</span><span class="n">trainingData</span><span class="p">,</span> <span class="n">testData</span><span class="p">)</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">])</span>
 
-<span class="c"># Train a DecisionTree model.</span>
-<span class="c">#  Empty categoricalFeaturesInfo indicates all features are continuous.</span>
+<span class="c1"># Train a DecisionTree model.</span>
+<span class="c1">#  Empty categoricalFeaturesInfo indicates all features are continuous.</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">DecisionTree</span><span class="o">.</span><span class="n">trainClassifier</span><span class="p">(</span><span class="n">trainingData</span><span class="p">,</span> <span class="n">numClasses</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">categoricalFeaturesInfo</span><span class="o">=</span><span class="p">{},</span>
-                                     <span class="n">impurity</span><span class="o">=</span><span class="s">&#39;gini&#39;</span><span class="p">,</span> <span class="n">maxDepth</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">maxBins</span><span class="o">=</span><span class="mi">32</span><span class="p">)</span>
+                                     <span class="n">impurity</span><span class="o">=</span><span class="s1">&#39;gini&#39;</span><span class="p">,</span> <span class="n">maxDepth</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">maxBins</span><span class="o">=</span><span class="mi">32</span><span class="p">)</span>
 
-<span class="c"># Evaluate model on test instances and compute test error</span>
+<span class="c1"># Evaluate model on test instances and compute test error</span>
 <span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">features</span><span class="p">))</span>
 <span class="n">labelsAndPredictions</span> <span class="o">=</span> <span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">)</span><span class="o">.</span><span class="n">zip</span><span class="p">(</span><span class="n">predictions</span><span class="p">)</span>
 <span class="n">testErr</span> <span class="o">=</span> <span class="n">labelsAndPredictions</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">p</span><span class="p">):</span> <span class="n">v</span> <span class="o">!=</span> <span class="n">p</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span> <span class="o">/</span> <span class="nb">float</span><span class="p">(</span><span class="n">testData</span><span class="o">.</span><span class="n">count</span><span class="p">())</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&#39;Test Error = &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">testErr</span><span class="p">))</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&#39;Learned classification tree model:&#39;</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s1">&#39;Test Error = &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">testErr</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s1">&#39;Learned classification tree model:&#39;</span><span class="p">)</span>
 <span class="k">print</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">toDebugString</span><span class="p">())</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/myDecisionTreeClassificationModel&quot;</span><span class="p">)</span>
-<span class="n">sameModel</span> <span class="o">=</span> <span class="n">DecisionTreeModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/myDecisionTreeClassificationModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/myDecisionTreeClassificationModel&quot;</span><span class="p">)</span>
+<span class="n">sameModel</span> <span class="o">=</span> <span class="n">DecisionTreeModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/myDecisionTreeClassificationModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/decision_tree_classification_example.py" in the Spark repo.</small></div>
   </div>
@@ -701,7 +701,7 @@ depth of 5. The Mean Squared Error (MSE) is computed at the end to evaluate
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.tree.DecisionTree"><code>DecisionTree</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.tree.model.DecisionTreeModel"><code>DecisionTreeModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.DecisionTree</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.DecisionTree</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.model.DecisionTreeModel</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
 
@@ -740,7 +740,7 @@ depth of 5. The Mean Squared Error (MSE) is computed at the end to evaluate
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/tree/DecisionTree.html"><code>DecisionTree</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/tree/model/DecisionTreeModel.html"><code>DecisionTreeModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.HashMap</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.HashMap</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.Map</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
@@ -757,8 +757,8 @@ depth of 5. The Mean Squared Error (MSE) is computed at the end to evaluate
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.tree.model.DecisionTreeModel</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span><span class="o">;</span>
 
-<span class="n">SparkConf</span> <span class="n">sparkConf</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;JavaDecisionTreeRegressionExample&quot;</span><span class="o">);</span>
-<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JavaSparkContext</span><span class="o">(</span><span class="n">sparkConf</span><span class="o">);</span>
+<span class="n">SparkConf</span> <span class="n">sparkConf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;JavaDecisionTreeRegressionExample&quot;</span><span class="o">);</span>
+<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaSparkContext</span><span class="o">(</span><span class="n">sparkConf</span><span class="o">);</span>
 
 <span class="c1">// Load and parse the data file.</span>
 <span class="n">String</span> <span class="n">datapath</span> <span class="o">=</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="o">;</span>
@@ -814,31 +814,31 @@ depth of 5. The Mean Squared Error (MSE) is computed at the end to evaluate
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.tree.DecisionTree"><code>DecisionTree</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.tree.DecisionTreeModel"><code>DecisionTreeModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.tree</span> <span class="kn">import</span> <span class="n">DecisionTree</span><span class="p">,</span> <span class="n">DecisionTreeModel</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.tree</span> <span class="kn">import</span> <span class="n">DecisionTree</span><span class="p">,</span> <span class="n">DecisionTreeModel</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span>
 
-<span class="c"># Load and parse the data file into an RDD of LabeledPoint.</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&#39;data/mllib/sample_libsvm_data.txt&#39;</span><span class="p">)</span>
-<span class="c"># Split the data into training and test sets (30% held out for testing)</span>
+<span class="c1"># Load and parse the data file into an RDD of LabeledPoint.</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s1">&#39;data/mllib/sample_libsvm_data.txt&#39;</span><span class="p">)</span>
+<span class="c1"># Split the data into training and test sets (30% held out for testing)</span>
 <span class="p">(</span><span class="n">trainingData</span><span class="p">,</span> <span class="n">testData</span><span class="p">)</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">])</span>
 
-<span class="c"># Train a DecisionTree model.</span>
-<span class="c">#  Empty categoricalFeaturesInfo indicates all features are continuous.</span>
+<span class="c1"># Train a DecisionTree model.</span>
+<span class="c1">#  Empty categoricalFeaturesInfo indicates all features are continuous.</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">DecisionTree</span><span class="o">.</span><span class="n">trainRegressor</span><span class="p">(</span><span class="n">trainingData</span><span class="p">,</span> <span class="n">categoricalFeaturesInfo</span><span class="o">=</span><span class="p">{},</span>
-                                    <span class="n">impurity</span><span class="o">=</span><span class="s">&#39;variance&#39;</span><span class="p">,</span> <span class="n">maxDepth</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">maxBins</span><span class="o">=</span><span class="mi">32</span><span class="p">)</span>
+                                    <span class="n">impurity</span><span class="o">=</span><span class="s1">&#39;variance&#39;</span><span class="p">,</span> <span class="n">maxDepth</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">maxBins</span><span class="o">=</span><span class="mi">32</span><span class="p">)</span>
 
-<span class="c"># Evaluate model on test instances and compute test error</span>
+<span class="c1"># Evaluate model on test instances and compute test error</span>
 <span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">features</span><span class="p">))</span>
 <span class="n">labelsAndPredictions</span> <span class="o">=</span> <span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">)</span><span class="o">.</span><span class="n">zip</span><span class="p">(</span><span class="n">predictions</span><span class="p">)</span>
 <span class="n">testMSE</span> <span class="o">=</span> <span class="n">labelsAndPredictions</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">p</span><span class="p">):</span> <span class="p">(</span><span class="n">v</span> <span class="o">-</span> <span class="n">p</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">v</span> <span class="o">-</span> <span class="n">p</span><span class="p">))</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span> <span class="o">/</span>\
     <span class="nb">float</span><span class="p">(</span><span class="n">testData</span><span class="o">.</span><span class="n">count</span><span class="p">())</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&#39;Test Mean Squared Error = &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">testMSE</span><span class="p">))</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&#39;Learned regression tree model:&#39;</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s1">&#39;Test Mean Squared Error = &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">testMSE</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s1">&#39;Learned regression tree model:&#39;</span><span class="p">)</span>
 <span class="k">print</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">toDebugString</span><span class="p">())</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/myDecisionTreeRegressionModel&quot;</span><span class="p">)</span>
-<span class="n">sameModel</span> <span class="o">=</span> <span class="n">DecisionTreeModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/myDecisionTreeRegressionModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/myDecisionTreeRegressionModel&quot;</span><span class="p">)</span>
+<span class="n">sameModel</span> <span class="o">=</span> <span class="n">DecisionTreeModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/myDecisionTreeRegressionModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/decision_tree_regression_example.py" in the Spark repo.</small></div>
   </div>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-dimensionality-reduction.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-dimensionality-reduction.html b/site/docs/2.1.0/mllib-dimensionality-reduction.html
index 239d2c1..0d67e32 100644
--- a/site/docs/2.1.0/mllib-dimensionality-reduction.html
+++ b/site/docs/2.1.0/mllib-dimensionality-reduction.html
@@ -331,12 +331,12 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#singular-value-decomposition-svd" id="markdown-toc-singular-value-decomposition-svd">Singular value decomposition (SVD)</a>    <ul>
-      <li><a href="#performance" id="markdown-toc-performance">Performance</a></li>
-      <li><a href="#svd-example" id="markdown-toc-svd-example">SVD Example</a></li>
+  <li><a href="#singular-value-decomposition-svd">Singular value decomposition (SVD)</a>    <ul>
+      <li><a href="#performance">Performance</a></li>
+      <li><a href="#svd-example">SVD Example</a></li>
     </ul>
   </li>
-  <li><a href="#principal-component-analysis-pca" id="markdown-toc-principal-component-analysis-pca">Principal component analysis (PCA)</a></li>
+  <li><a href="#principal-component-analysis-pca">Principal component analysis (PCA)</a></li>
 </ul>
 
 <p><a href="http://en.wikipedia.org/wiki/Dimensionality_reduction">Dimensionality reduction</a> is the process 
@@ -354,7 +354,7 @@ factorizes a matrix into three matrices: $U$, $\Sigma$, and $V$ such that</p>
 A = U \Sigma V^T,
 \]</code></p>
 
-<p>where</p>
+<p>where </p>
 
 <ul>
   <li>$U$ is an orthonormal matrix, whose columns are called left singular vectors,</li>
@@ -396,13 +396,13 @@ passes, $O(n)$ storage on each executor, and $O(n k)$ storage on the driver.</li
 <h3 id="svd-example">SVD Example</h3>
 
 <p><code>spark.mllib</code> provides SVD functionality to row-oriented matrices, provided in the
-<a href="mllib-data-types.html#rowmatrix">RowMatrix</a> class.</p>
+<a href="mllib-data-types.html#rowmatrix">RowMatrix</a> class. </p>
 
 <div class="codetabs">
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.linalg.SingularValueDecomposition"><code>SingularValueDecomposition</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Matrix</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Matrix</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.SingularValueDecomposition</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vector</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
@@ -431,7 +431,7 @@ passes, $O(n)$ storage on each executor, and $O(n k)$ storage on the driver.</li
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/linalg/SingularValueDecomposition.html"><code>SingularValueDecomposition</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.LinkedList</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.LinkedList</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaSparkContext</span><span class="o">;</span>
@@ -450,10 +450,10 @@ passes, $O(n)$ storage on each executor, and $O(n k)$ storage on the driver.</li
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Vector</span><span class="o">&gt;</span> <span class="n">rows</span> <span class="o">=</span> <span class="n">jsc</span><span class="o">.</span><span class="na">parallelize</span><span class="o">(</span><span class="n">rowsList</span><span class="o">);</span>
 
 <span class="c1">// Create a RowMatrix from JavaRDD&lt;Vector&gt;.</span>
-<span class="n">RowMatrix</span> <span class="n">mat</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">RowMatrix</span><span class="o">(</span><span class="n">rows</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
+<span class="n">RowMatrix</span> <span class="n">mat</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RowMatrix</span><span class="o">(</span><span class="n">rows</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 
 <span class="c1">// Compute the top 3 singular values and corresponding singular vectors.</span>
-<span class="n">SingularValueDecomposition</span><span class="o">&lt;</span><span class="n">RowMatrix</span><span class="o">,</span> <span class="n">Matrix</span><span class="o">&gt;</span> <span class="n">svd</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="na">computeSVD</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="kc">true</span><span class="o">,</span> <span class="mf">1.0</span><span class="n">E</span><span class="o">-</span><span class="mi">9</span><span class="n">d</span><span class="o">);</span>
+<span class="n">SingularValueDecomposition</span><span class="o">&lt;</span><span class="n">RowMatrix</span><span class="o">,</span> <span class="n">Matrix</span><span class="o">&gt;</span> <span class="n">svd</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="na">computeSVD</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="kc">true</span><span class="o">,</span> <span class="mf">1.0E-9d</span><span class="o">);</span>
 <span class="n">RowMatrix</span> <span class="n">U</span> <span class="o">=</span> <span class="n">svd</span><span class="o">.</span><span class="na">U</span><span class="o">();</span>
 <span class="n">Vector</span> <span class="n">s</span> <span class="o">=</span> <span class="n">svd</span><span class="o">.</span><span class="na">s</span><span class="o">();</span>
 <span class="n">Matrix</span> <span class="n">V</span> <span class="o">=</span> <span class="n">svd</span><span class="o">.</span><span class="na">V</span><span class="o">();</span>
@@ -489,7 +489,7 @@ and use them to project the vectors into a low-dimensional space.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.linalg.distributed.RowMatrix"><code>RowMatrix</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Matrix</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Matrix</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.distributed.RowMatrix</span>
 
@@ -516,7 +516,7 @@ and use them to project the vectors into a low-dimensional space while keeping a
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.feature.PCA"><code>PCA</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.PCA</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.PCA</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.rdd.RDD</span>
@@ -547,7 +547,7 @@ The number of columns should be small, e.g, less than 1000.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/linalg/distributed/RowMatrix.html"><code>RowMatrix</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.LinkedList</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.LinkedList</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaSparkContext</span><span class="o">;</span>
@@ -565,7 +565,7 @@ The number of columns should be small, e.g, less than 1000.</p>
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Vector</span><span class="o">&gt;</span> <span class="n">rows</span> <span class="o">=</span> <span class="n">JavaSparkContext</span><span class="o">.</span><span class="na">fromSparkContext</span><span class="o">(</span><span class="n">sc</span><span class="o">).</span><span class="na">parallelize</span><span class="o">(</span><span class="n">rowsList</span><span class="o">);</span>
 
 <span class="c1">// Create a RowMatrix from JavaRDD&lt;Vector&gt;.</span>
-<span class="n">RowMatrix</span> <span class="n">mat</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">RowMatrix</span><span class="o">(</span><span class="n">rows</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
+<span class="n">RowMatrix</span> <span class="n">mat</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RowMatrix</span><span class="o">(</span><span class="n">rows</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 
 <span class="c1">// Compute the top 3 principal components.</span>
 <span class="n">Matrix</span> <span class="n">pc</span> <span class="o">=</span> <span class="n">mat</span><span class="o">.</span><span class="na">computePrincipalComponents</span><span class="o">(</span><span class="mi">3</span><span class="o">);</span>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-ensembles.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-ensembles.html b/site/docs/2.1.0/mllib-ensembles.html
index ab17ce5..604c546 100644
--- a/site/docs/2.1.0/mllib-ensembles.html
+++ b/site/docs/2.1.0/mllib-ensembles.html
@@ -307,33 +307,33 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#gradient-boosted-trees-vs-random-forests" id="markdown-toc-gradient-boosted-trees-vs-random-forests">Gradient-Boosted Trees vs. Random Forests</a></li>
-  <li><a href="#random-forests" id="markdown-toc-random-forests">Random Forests</a>    <ul>
-      <li><a href="#basic-algorithm" id="markdown-toc-basic-algorithm">Basic algorithm</a>        <ul>
-          <li><a href="#training" id="markdown-toc-training">Training</a></li>
-          <li><a href="#prediction" id="markdown-toc-prediction">Prediction</a></li>
+  <li><a href="#gradient-boosted-trees-vs-random-forests">Gradient-Boosted Trees vs. Random Forests</a></li>
+  <li><a href="#random-forests">Random Forests</a>    <ul>
+      <li><a href="#basic-algorithm">Basic algorithm</a>        <ul>
+          <li><a href="#training">Training</a></li>
+          <li><a href="#prediction">Prediction</a></li>
         </ul>
       </li>
-      <li><a href="#usage-tips" id="markdown-toc-usage-tips">Usage tips</a></li>
-      <li><a href="#examples" id="markdown-toc-examples">Examples</a>        <ul>
-          <li><a href="#classification" id="markdown-toc-classification">Classification</a></li>
-          <li><a href="#regression" id="markdown-toc-regression">Regression</a></li>
+      <li><a href="#usage-tips">Usage tips</a></li>
+      <li><a href="#examples">Examples</a>        <ul>
+          <li><a href="#classification">Classification</a></li>
+          <li><a href="#regression">Regression</a></li>
         </ul>
       </li>
     </ul>
   </li>
-  <li><a href="#gradient-boosted-trees-gbts" id="markdown-toc-gradient-boosted-trees-gbts">Gradient-Boosted Trees (GBTs)</a>    <ul>
-      <li><a href="#basic-algorithm-1" id="markdown-toc-basic-algorithm-1">Basic algorithm</a>        <ul>
-          <li><a href="#losses" id="markdown-toc-losses">Losses</a></li>
+  <li><a href="#gradient-boosted-trees-gbts">Gradient-Boosted Trees (GBTs)</a>    <ul>
+      <li><a href="#basic-algorithm-1">Basic algorithm</a>        <ul>
+          <li><a href="#losses">Losses</a></li>
         </ul>
       </li>
-      <li><a href="#usage-tips-1" id="markdown-toc-usage-tips-1">Usage tips</a>        <ul>
-          <li><a href="#validation-while-training" id="markdown-toc-validation-while-training">Validation while training</a></li>
+      <li><a href="#usage-tips-1">Usage tips</a>        <ul>
+          <li><a href="#validation-while-training">Validation while training</a></li>
         </ul>
       </li>
-      <li><a href="#examples-1" id="markdown-toc-examples-1">Examples</a>        <ul>
-          <li><a href="#classification-1" id="markdown-toc-classification-1">Classification</a></li>
-          <li><a href="#regression-1" id="markdown-toc-regression-1">Regression</a></li>
+      <li><a href="#examples-1">Examples</a>        <ul>
+          <li><a href="#classification-1">Classification</a></li>
+          <li><a href="#regression-1">Regression</a></li>
         </ul>
       </li>
     </ul>
@@ -450,7 +450,7 @@ The test error is calculated to measure the algorithm accuracy.</p>
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.tree.RandomForest$"><code>RandomForest</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.tree.model.RandomForestModel"><code>RandomForestModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.RandomForest</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.RandomForest</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.model.RandomForestModel</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
 
@@ -492,7 +492,7 @@ The test error is calculated to measure the algorithm accuracy.</p>
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/tree/RandomForest.html"><code>RandomForest</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/tree/model/RandomForestModel.html"><code>RandomForestModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.HashMap</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.HashMap</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
 
@@ -507,8 +507,8 @@ The test error is calculated to measure the algorithm accuracy.</p>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.tree.model.RandomForestModel</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span><span class="o">;</span>
 
-<span class="n">SparkConf</span> <span class="n">sparkConf</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;JavaRandomForestClassificationExample&quot;</span><span class="o">);</span>
-<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JavaSparkContext</span><span class="o">(</span><span class="n">sparkConf</span><span class="o">);</span>
+<span class="n">SparkConf</span> <span class="n">sparkConf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;JavaRandomForestClassificationExample&quot;</span><span class="o">);</span>
+<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaSparkContext</span><span class="o">(</span><span class="n">sparkConf</span><span class="o">);</span>
 <span class="c1">// Load and parse the data file.</span>
 <span class="n">String</span> <span class="n">datapath</span> <span class="o">=</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="o">;</span>
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">LabeledPoint</span><span class="o">&gt;</span> <span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="na">loadLibSVMFile</span><span class="o">(</span><span class="n">jsc</span><span class="o">.</span><span class="na">sc</span><span class="o">(),</span> <span class="n">datapath</span><span class="o">).</span><span class="na">toJavaRDD</span><span class="o">();</span>
@@ -561,33 +561,33 @@ The test error is calculated to measure the algorithm accuracy.</p>
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.tree.RandomForest"><code>RandomForest</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.tree.RandomForestModel"><code>RandomForest</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.tree</span> <span class="kn">import</span> <span class="n">RandomForest</span><span class="p">,</span> <span class="n">RandomForestModel</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.tree</span> <span class="kn">import</span> <span class="n">RandomForest</span><span class="p">,</span> <span class="n">RandomForestModel</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span>
 
-<span class="c"># Load and parse the data file into an RDD of LabeledPoint.</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&#39;data/mllib/sample_libsvm_data.txt&#39;</span><span class="p">)</span>
-<span class="c"># Split the data into training and test sets (30% held out for testing)</span>
+<span class="c1"># Load and parse the data file into an RDD of LabeledPoint.</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s1">&#39;data/mllib/sample_libsvm_data.txt&#39;</span><span class="p">)</span>
+<span class="c1"># Split the data into training and test sets (30% held out for testing)</span>
 <span class="p">(</span><span class="n">trainingData</span><span class="p">,</span> <span class="n">testData</span><span class="p">)</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">])</span>
 
-<span class="c"># Train a RandomForest model.</span>
-<span class="c">#  Empty categoricalFeaturesInfo indicates all features are continuous.</span>
-<span class="c">#  Note: Use larger numTrees in practice.</span>
-<span class="c">#  Setting featureSubsetStrategy=&quot;auto&quot; lets the algorithm choose.</span>
+<span class="c1"># Train a RandomForest model.</span>
+<span class="c1">#  Empty categoricalFeaturesInfo indicates all features are continuous.</span>
+<span class="c1">#  Note: Use larger numTrees in practice.</span>
+<span class="c1">#  Setting featureSubsetStrategy=&quot;auto&quot; lets the algorithm choose.</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">RandomForest</span><span class="o">.</span><span class="n">trainClassifier</span><span class="p">(</span><span class="n">trainingData</span><span class="p">,</span> <span class="n">numClasses</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">categoricalFeaturesInfo</span><span class="o">=</span><span class="p">{},</span>
-                                     <span class="n">numTrees</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">featureSubsetStrategy</span><span class="o">=</span><span class="s">&quot;auto&quot;</span><span class="p">,</span>
-                                     <span class="n">impurity</span><span class="o">=</span><span class="s">&#39;gini&#39;</span><span class="p">,</span> <span class="n">maxDepth</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">maxBins</span><span class="o">=</span><span class="mi">32</span><span class="p">)</span>
+                                     <span class="n">numTrees</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">featureSubsetStrategy</span><span class="o">=</span><span class="s2">&quot;auto&quot;</span><span class="p">,</span>
+                                     <span class="n">impurity</span><span class="o">=</span><span class="s1">&#39;gini&#39;</span><span class="p">,</span> <span class="n">maxDepth</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">maxBins</span><span class="o">=</span><span class="mi">32</span><span class="p">)</span>
 
-<span class="c"># Evaluate model on test instances and compute test error</span>
+<span class="c1"># Evaluate model on test instances and compute test error</span>
 <span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">features</span><span class="p">))</span>
 <span class="n">labelsAndPredictions</span> <span class="o">=</span> <span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">)</span><span class="o">.</span><span class="n">zip</span><span class="p">(</span><span class="n">predictions</span><span class="p">)</span>
 <span class="n">testErr</span> <span class="o">=</span> <span class="n">labelsAndPredictions</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">p</span><span class="p">):</span> <span class="n">v</span> <span class="o">!=</span> <span class="n">p</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span> <span class="o">/</span> <span class="nb">float</span><span class="p">(</span><span class="n">testData</span><span class="o">.</span><span class="n">count</span><span class="p">())</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&#39;Test Error = &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">testErr</span><span class="p">))</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&#39;Learned classification forest model:&#39;</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s1">&#39;Test Error = &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">testErr</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s1">&#39;Learned classification forest model:&#39;</span><span class="p">)</span>
 <span class="k">print</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">toDebugString</span><span class="p">())</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/myRandomForestClassificationModel&quot;</span><span class="p">)</span>
-<span class="n">sameModel</span> <span class="o">=</span> <span class="n">RandomForestModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/myRandomForestClassificationModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/myRandomForestClassificationModel&quot;</span><span class="p">)</span>
+<span class="n">sameModel</span> <span class="o">=</span> <span class="n">RandomForestModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/myRandomForestClassificationModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/random_forest_classification_example.py" in the Spark repo.</small></div>
   </div>
@@ -608,7 +608,7 @@ The Mean Squared Error (MSE) is computed at the end to evaluate
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.tree.RandomForest$"><code>RandomForest</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.tree.model.RandomForestModel"><code>RandomForestModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.RandomForest</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.RandomForest</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.model.RandomForestModel</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
 
@@ -650,7 +650,7 @@ The Mean Squared Error (MSE) is computed at the end to evaluate
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/tree/RandomForest.html"><code>RandomForest</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/tree/model/RandomForestModel.html"><code>RandomForestModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.HashMap</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.HashMap</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.Map</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
@@ -667,8 +667,8 @@ The Mean Squared Error (MSE) is computed at the end to evaluate
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.SparkConf</span><span class="o">;</span>
 
-<span class="n">SparkConf</span> <span class="n">sparkConf</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;JavaRandomForestRegressionExample&quot;</span><span class="o">);</span>
-<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JavaSparkContext</span><span class="o">(</span><span class="n">sparkConf</span><span class="o">);</span>
+<span class="n">SparkConf</span> <span class="n">sparkConf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;JavaRandomForestRegressionExample&quot;</span><span class="o">);</span>
+<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaSparkContext</span><span class="o">(</span><span class="n">sparkConf</span><span class="o">);</span>
 <span class="c1">// Load and parse the data file.</span>
 <span class="n">String</span> <span class="n">datapath</span> <span class="o">=</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="o">;</span>
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">LabeledPoint</span><span class="o">&gt;</span> <span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="na">loadLibSVMFile</span><span class="o">(</span><span class="n">jsc</span><span class="o">.</span><span class="na">sc</span><span class="o">(),</span> <span class="n">datapath</span><span class="o">).</span><span class="na">toJavaRDD</span><span class="o">();</span>
@@ -725,34 +725,34 @@ The Mean Squared Error (MSE) is computed at the end to evaluate
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.tree.RandomForest"><code>RandomForest</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.tree.RandomForestModel"><code>RandomForest</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.tree</span> <span class="kn">import</span> <span class="n">RandomForest</span><span class="p">,</span> <span class="n">RandomForestModel</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.tree</span> <span class="kn">import</span> <span class="n">RandomForest</span><span class="p">,</span> <span class="n">RandomForestModel</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span>
 
-<span class="c"># Load and parse the data file into an RDD of LabeledPoint.</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&#39;data/mllib/sample_libsvm_data.txt&#39;</span><span class="p">)</span>
-<span class="c"># Split the data into training and test sets (30% held out for testing)</span>
+<span class="c1"># Load and parse the data file into an RDD of LabeledPoint.</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s1">&#39;data/mllib/sample_libsvm_data.txt&#39;</span><span class="p">)</span>
+<span class="c1"># Split the data into training and test sets (30% held out for testing)</span>
 <span class="p">(</span><span class="n">trainingData</span><span class="p">,</span> <span class="n">testData</span><span class="p">)</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">])</span>
 
-<span class="c"># Train a RandomForest model.</span>
-<span class="c">#  Empty categoricalFeaturesInfo indicates all features are continuous.</span>
-<span class="c">#  Note: Use larger numTrees in practice.</span>
-<span class="c">#  Setting featureSubsetStrategy=&quot;auto&quot; lets the algorithm choose.</span>
+<span class="c1"># Train a RandomForest model.</span>
+<span class="c1">#  Empty categoricalFeaturesInfo indicates all features are continuous.</span>
+<span class="c1">#  Note: Use larger numTrees in practice.</span>
+<span class="c1">#  Setting featureSubsetStrategy=&quot;auto&quot; lets the algorithm choose.</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">RandomForest</span><span class="o">.</span><span class="n">trainRegressor</span><span class="p">(</span><span class="n">trainingData</span><span class="p">,</span> <span class="n">categoricalFeaturesInfo</span><span class="o">=</span><span class="p">{},</span>
-                                    <span class="n">numTrees</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">featureSubsetStrategy</span><span class="o">=</span><span class="s">&quot;auto&quot;</span><span class="p">,</span>
-                                    <span class="n">impurity</span><span class="o">=</span><span class="s">&#39;variance&#39;</span><span class="p">,</span> <span class="n">maxDepth</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">maxBins</span><span class="o">=</span><span class="mi">32</span><span class="p">)</span>
+                                    <span class="n">numTrees</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">featureSubsetStrategy</span><span class="o">=</span><span class="s2">&quot;auto&quot;</span><span class="p">,</span>
+                                    <span class="n">impurity</span><span class="o">=</span><span class="s1">&#39;variance&#39;</span><span class="p">,</span> <span class="n">maxDepth</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">maxBins</span><span class="o">=</span><span class="mi">32</span><span class="p">)</span>
 
-<span class="c"># Evaluate model on test instances and compute test error</span>
+<span class="c1"># Evaluate model on test instances and compute test error</span>
 <span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">features</span><span class="p">))</span>
 <span class="n">labelsAndPredictions</span> <span class="o">=</span> <span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">)</span><span class="o">.</span><span class="n">zip</span><span class="p">(</span><span class="n">predictions</span><span class="p">)</span>
 <span class="n">testMSE</span> <span class="o">=</span> <span class="n">labelsAndPredictions</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">p</span><span class="p">):</span> <span class="p">(</span><span class="n">v</span> <span class="o">-</span> <span class="n">p</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">v</span> <span class="o">-</span> <span class="n">p</span><span class="p">))</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span> <span class="o">/</span>\
     <span class="nb">float</span><span class="p">(</span><span class="n">testData</span><span class="o">.</span><span class="n">count</span><span class="p">())</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&#39;Test Mean Squared Error = &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">testMSE</span><span class="p">))</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&#39;Learned regression forest model:&#39;</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s1">&#39;Test Mean Squared Error = &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">testMSE</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s1">&#39;Learned regression forest model:&#39;</span><span class="p">)</span>
 <span class="k">print</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">toDebugString</span><span class="p">())</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/myRandomForestRegressionModel&quot;</span><span class="p">)</span>
-<span class="n">sameModel</span> <span class="o">=</span> <span class="n">RandomForestModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/myRandomForestRegressionModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/myRandomForestRegressionModel&quot;</span><span class="p">)</span>
+<span class="n">sameModel</span> <span class="o">=</span> <span class="n">RandomForestModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/myRandomForestRegressionModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/random_forest_regression_example.py" in the Spark repo.</small></div>
   </div>
@@ -859,7 +859,7 @@ The test error is calculated to measure the algorithm accuracy.</p>
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.tree.GradientBoostedTrees"><code>GradientBoostedTrees</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.tree.model.GradientBoostedTreesModel"><code>GradientBoostedTreesModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.GradientBoostedTrees</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.GradientBoostedTrees</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.configuration.BoostingStrategy</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.model.GradientBoostedTreesModel</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
@@ -901,7 +901,7 @@ The test error is calculated to measure the algorithm accuracy.</p>
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/tree/GradientBoostedTrees.html"><code>GradientBoostedTrees</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/tree/model/GradientBoostedTreesModel.html"><code>GradientBoostedTreesModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.HashMap</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.HashMap</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.Map</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
@@ -918,9 +918,9 @@ The test error is calculated to measure the algorithm accuracy.</p>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.tree.model.GradientBoostedTreesModel</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span><span class="o">;</span>
 
-<span class="n">SparkConf</span> <span class="n">sparkConf</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">SparkConf</span><span class="o">()</span>
+<span class="n">SparkConf</span> <span class="n">sparkConf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SparkConf</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;JavaGradientBoostedTreesClassificationExample&quot;</span><span class="o">);</span>
-<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JavaSparkContext</span><span class="o">(</span><span class="n">sparkConf</span><span class="o">);</span>
+<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaSparkContext</span><span class="o">(</span><span class="n">sparkConf</span><span class="o">);</span>
 
 <span class="c1">// Load and parse the data file.</span>
 <span class="n">String</span> <span class="n">datapath</span> <span class="o">=</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="o">;</span>
@@ -972,32 +972,32 @@ The test error is calculated to measure the algorithm accuracy.</p>
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.tree.GradientBoostedTrees"><code>GradientBoostedTrees</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.tree.GradientBoostedTreesModel"><code>GradientBoostedTreesModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.tree</span> <span class="kn">import</span> <span class="n">GradientBoostedTrees</span><span class="p">,</span> <span class="n">GradientBoostedTreesModel</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.tree</span> <span class="kn">import</span> <span class="n">GradientBoostedTrees</span><span class="p">,</span> <span class="n">GradientBoostedTreesModel</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span>
 
-<span class="c"># Load and parse the data file.</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
-<span class="c"># Split the data into training and test sets (30% held out for testing)</span>
+<span class="c1"># Load and parse the data file.</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
+<span class="c1"># Split the data into training and test sets (30% held out for testing)</span>
 <span class="p">(</span><span class="n">trainingData</span><span class="p">,</span> <span class="n">testData</span><span class="p">)</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">])</span>
 
-<span class="c"># Train a GradientBoostedTrees model.</span>
-<span class="c">#  Notes: (a) Empty categoricalFeaturesInfo indicates all features are continuous.</span>
-<span class="c">#         (b) Use more iterations in practice.</span>
+<span class="c1"># Train a GradientBoostedTrees model.</span>
+<span class="c1">#  Notes: (a) Empty categoricalFeaturesInfo indicates all features are continuous.</span>
+<span class="c1">#         (b) Use more iterations in practice.</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">GradientBoostedTrees</span><span class="o">.</span><span class="n">trainClassifier</span><span class="p">(</span><span class="n">trainingData</span><span class="p">,</span>
                                              <span class="n">categoricalFeaturesInfo</span><span class="o">=</span><span class="p">{},</span> <span class="n">numIterations</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
 
-<span class="c"># Evaluate model on test instances and compute test error</span>
+<span class="c1"># Evaluate model on test instances and compute test error</span>
 <span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">features</span><span class="p">))</span>
 <span class="n">labelsAndPredictions</span> <span class="o">=</span> <span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">)</span><span class="o">.</span><span class="n">zip</span><span class="p">(</span><span class="n">predictions</span><span class="p">)</span>
 <span class="n">testErr</span> <span class="o">=</span> <span class="n">labelsAndPredictions</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">p</span><span class="p">):</span> <span class="n">v</span> <span class="o">!=</span> <span class="n">p</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span> <span class="o">/</span> <span class="nb">float</span><span class="p">(</span><span class="n">testData</span><span class="o">.</span><span class="n">count</span><span class="p">())</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&#39;Test Error = &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">testErr</span><span class="p">))</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&#39;Learned classification GBT model:&#39;</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s1">&#39;Test Error = &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">testErr</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s1">&#39;Learned classification GBT model:&#39;</span><span class="p">)</span>
 <span class="k">print</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">toDebugString</span><span class="p">())</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/myGradientBoostingClassificationModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/myGradientBoostingClassificationModel&quot;</span><span class="p">)</span>
 <span class="n">sameModel</span> <span class="o">=</span> <span class="n">GradientBoostedTreesModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span>
-                                           <span class="s">&quot;target/tmp/myGradientBoostingClassificationModel&quot;</span><span class="p">)</span>
+                                           <span class="s2">&quot;target/tmp/myGradientBoostingClassificationModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/gradient_boosting_classification_example.py" in the Spark repo.</small></div>
   </div>
@@ -1018,7 +1018,7 @@ The Mean Squared Error (MSE) is computed at the end to evaluate
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.tree.GradientBoostedTrees"><code>GradientBoostedTrees</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.tree.model.GradientBoostedTreesModel"><code>GradientBoostedTreesModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.GradientBoostedTrees</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.GradientBoostedTrees</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.configuration.BoostingStrategy</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.tree.model.GradientBoostedTreesModel</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
@@ -1059,7 +1059,7 @@ The Mean Squared Error (MSE) is computed at the end to evaluate
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/tree/GradientBoostedTrees.html"><code>GradientBoostedTrees</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/tree/model/GradientBoostedTreesModel.html"><code>GradientBoostedTreesModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.HashMap</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.HashMap</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.Map</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
@@ -1077,9 +1077,9 @@ The Mean Squared Error (MSE) is computed at the end to evaluate
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.tree.model.GradientBoostedTreesModel</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span><span class="o">;</span>
 
-<span class="n">SparkConf</span> <span class="n">sparkConf</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">SparkConf</span><span class="o">()</span>
+<span class="n">SparkConf</span> <span class="n">sparkConf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SparkConf</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;JavaGradientBoostedTreesRegressionExample&quot;</span><span class="o">);</span>
-<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JavaSparkContext</span><span class="o">(</span><span class="n">sparkConf</span><span class="o">);</span>
+<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaSparkContext</span><span class="o">(</span><span class="n">sparkConf</span><span class="o">);</span>
 <span class="c1">// Load and parse the data file.</span>
 <span class="n">String</span> <span class="n">datapath</span> <span class="o">=</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="o">;</span>
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">LabeledPoint</span><span class="o">&gt;</span> <span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="na">loadLibSVMFile</span><span class="o">(</span><span class="n">jsc</span><span class="o">.</span><span class="na">sc</span><span class="o">(),</span> <span class="n">datapath</span><span class="o">).</span><span class="na">toJavaRDD</span><span class="o">();</span>
@@ -1135,32 +1135,32 @@ The Mean Squared Error (MSE) is computed at the end to evaluate
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.tree.GradientBoostedTrees"><code>GradientBoostedTrees</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.tree.GradientBoostedTreesModel"><code>GradientBoostedTreesModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.tree</span> <span class="kn">import</span> <span class="n">GradientBoostedTrees</span><span class="p">,</span> <span class="n">GradientBoostedTreesModel</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.tree</span> <span class="kn">import</span> <span class="n">GradientBoostedTrees</span><span class="p">,</span> <span class="n">GradientBoostedTreesModel</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span>
 
-<span class="c"># Load and parse the data file.</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
-<span class="c"># Split the data into training and test sets (30% held out for testing)</span>
+<span class="c1"># Load and parse the data file.</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
+<span class="c1"># Split the data into training and test sets (30% held out for testing)</span>
 <span class="p">(</span><span class="n">trainingData</span><span class="p">,</span> <span class="n">testData</span><span class="p">)</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.7</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">])</span>
 
-<span class="c"># Train a GradientBoostedTrees model.</span>
-<span class="c">#  Notes: (a) Empty categoricalFeaturesInfo indicates all features are continuous.</span>
-<span class="c">#         (b) Use more iterations in practice.</span>
+<span class="c1"># Train a GradientBoostedTrees model.</span>
+<span class="c1">#  Notes: (a) Empty categoricalFeaturesInfo indicates all features are continuous.</span>
+<span class="c1">#         (b) Use more iterations in practice.</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">GradientBoostedTrees</span><span class="o">.</span><span class="n">trainRegressor</span><span class="p">(</span><span class="n">trainingData</span><span class="p">,</span>
                                             <span class="n">categoricalFeaturesInfo</span><span class="o">=</span><span class="p">{},</span> <span class="n">numIterations</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
 
-<span class="c"># Evaluate model on test instances and compute test error</span>
+<span class="c1"># Evaluate model on test instances and compute test error</span>
 <span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">features</span><span class="p">))</span>
 <span class="n">labelsAndPredictions</span> <span class="o">=</span> <span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">)</span><span class="o">.</span><span class="n">zip</span><span class="p">(</span><span class="n">predictions</span><span class="p">)</span>
 <span class="n">testMSE</span> <span class="o">=</span> <span class="n">labelsAndPredictions</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">p</span><span class="p">):</span> <span class="p">(</span><span class="n">v</span> <span class="o">-</span> <span class="n">p</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">v</span> <span class="o">-</span> <span class="n">p</span><span class="p">))</span><span class="o">.</span><span class="n">sum</span><span class="p">()</span> <span class="o">/</span>\
     <span class="nb">float</span><span class="p">(</span><span class="n">testData</span><span class="o">.</span><span class="n">count</span><span class="p">())</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&#39;Test Mean Squared Error = &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">testMSE</span><span class="p">))</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&#39;Learned regression GBT model:&#39;</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s1">&#39;Test Mean Squared Error = &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">testMSE</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s1">&#39;Learned regression GBT model:&#39;</span><span class="p">)</span>
 <span class="k">print</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">toDebugString</span><span class="p">())</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/myGradientBoostingRegressionModel&quot;</span><span class="p">)</span>
-<span class="n">sameModel</span> <span class="o">=</span> <span class="n">GradientBoostedTreesModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/myGradientBoostingRegressionModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/myGradientBoostingRegressionModel&quot;</span><span class="p">)</span>
+<span class="n">sameModel</span> <span class="o">=</span> <span class="n">GradientBoostedTreesModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/myGradientBoostingRegressionModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/gradient_boosting_regression_example.py" in the Spark repo.</small></div>
   </div>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[13/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-evaluation-metrics.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-evaluation-metrics.html b/site/docs/2.1.0/mllib-evaluation-metrics.html
index 4bc636d..0d5bb3b 100644
--- a/site/docs/2.1.0/mllib-evaluation-metrics.html
+++ b/site/docs/2.1.0/mllib-evaluation-metrics.html
@@ -307,20 +307,20 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#classification-model-evaluation" id="markdown-toc-classification-model-evaluation">Classification model evaluation</a>    <ul>
-      <li><a href="#binary-classification" id="markdown-toc-binary-classification">Binary classification</a>        <ul>
-          <li><a href="#threshold-tuning" id="markdown-toc-threshold-tuning">Threshold tuning</a></li>
+  <li><a href="#classification-model-evaluation">Classification model evaluation</a>    <ul>
+      <li><a href="#binary-classification">Binary classification</a>        <ul>
+          <li><a href="#threshold-tuning">Threshold tuning</a></li>
         </ul>
       </li>
-      <li><a href="#multiclass-classification" id="markdown-toc-multiclass-classification">Multiclass classification</a>        <ul>
-          <li><a href="#label-based-metrics" id="markdown-toc-label-based-metrics">Label based metrics</a></li>
+      <li><a href="#multiclass-classification">Multiclass classification</a>        <ul>
+          <li><a href="#label-based-metrics">Label based metrics</a></li>
         </ul>
       </li>
-      <li><a href="#multilabel-classification" id="markdown-toc-multilabel-classification">Multilabel classification</a></li>
-      <li><a href="#ranking-systems" id="markdown-toc-ranking-systems">Ranking systems</a></li>
+      <li><a href="#multilabel-classification">Multilabel classification</a></li>
+      <li><a href="#ranking-systems">Ranking systems</a></li>
     </ul>
   </li>
-  <li><a href="#regression-model-evaluation" id="markdown-toc-regression-model-evaluation">Regression model evaluation</a></li>
+  <li><a href="#regression-model-evaluation">Regression model evaluation</a></li>
 </ul>
 
 <p><code>spark.mllib</code> comes with a number of machine learning algorithms that can be used to learn from and make predictions
@@ -421,7 +421,7 @@ data, and evaluate the performance of the algorithm by several binary evaluation
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS"><code>LogisticRegressionWithLBFGS</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.evaluation.BinaryClassificationMetrics"><code>BinaryClassificationMetrics</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.evaluation.BinaryClassificationMetrics</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
@@ -453,13 +453,13 @@ data, and evaluate the performance of the algorithm by several binary evaluation
 <span class="c1">// Precision by threshold</span>
 <span class="k">val</span> <span class="n">precision</span> <span class="k">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precisionByThreshold</span>
 <span class="n">precision</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">t</span><span class="o">,</span> <span class="n">p</span><span class="o">)</span> <span class="k">=&gt;</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Threshold: $t, Precision: $p&quot;</span><span class="o">)</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;Threshold: </span><span class="si">$t</span><span class="s">, Precision: </span><span class="si">$p</span><span class="s">&quot;</span><span class="o">)</span>
 <span class="o">}</span>
 
 <span class="c1">// Recall by threshold</span>
 <span class="k">val</span> <span class="n">recall</span> <span class="k">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recallByThreshold</span>
 <span class="n">recall</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">t</span><span class="o">,</span> <span class="n">r</span><span class="o">)</span> <span class="k">=&gt;</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Threshold: $t, Recall: $r&quot;</span><span class="o">)</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;Threshold: </span><span class="si">$t</span><span class="s">, Recall: </span><span class="si">$r</span><span class="s">&quot;</span><span class="o">)</span>
 <span class="o">}</span>
 
 <span class="c1">// Precision-Recall Curve</span>
@@ -468,13 +468,13 @@ data, and evaluate the performance of the algorithm by several binary evaluation
 <span class="c1">// F-measure</span>
 <span class="k">val</span> <span class="n">f1Score</span> <span class="k">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">fMeasureByThreshold</span>
 <span class="n">f1Score</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">t</span><span class="o">,</span> <span class="n">f</span><span class="o">)</span> <span class="k">=&gt;</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Threshold: $t, F-score: $f, Beta = 1&quot;</span><span class="o">)</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;Threshold: </span><span class="si">$t</span><span class="s">, F-score: </span><span class="si">$f</span><span class="s">, Beta = 1&quot;</span><span class="o">)</span>
 <span class="o">}</span>
 
 <span class="k">val</span> <span class="n">beta</span> <span class="k">=</span> <span class="mf">0.5</span>
 <span class="k">val</span> <span class="n">fScore</span> <span class="k">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">fMeasureByThreshold</span><span class="o">(</span><span class="n">beta</span><span class="o">)</span>
 <span class="n">f1Score</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">t</span><span class="o">,</span> <span class="n">f</span><span class="o">)</span> <span class="k">=&gt;</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Threshold: $t, F-score: $f, Beta = 0.5&quot;</span><span class="o">)</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;Threshold: </span><span class="si">$t</span><span class="s">, F-score: </span><span class="si">$f</span><span class="s">, Beta = 0.5&quot;</span><span class="o">)</span>
 <span class="o">}</span>
 
 <span class="c1">// AUPRC</span>
@@ -498,7 +498,7 @@ data, and evaluate the performance of the algorithm by several binary evaluation
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/classification/LogisticRegressionModel.html"><code>LogisticRegressionModel</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/classification/LogisticRegressionWithLBFGS.html"><code>LogisticRegressionWithLBFGS</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.*</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
@@ -518,7 +518,7 @@ data, and evaluate the performance of the algorithm by several binary evaluation
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">LabeledPoint</span><span class="o">&gt;</span> <span class="n">test</span> <span class="o">=</span> <span class="n">splits</span><span class="o">[</span><span class="mi">1</span><span class="o">];</span>
 
 <span class="c1">// Run training algorithm to build the model.</span>
-<span class="kd">final</span> <span class="n">LogisticRegressionModel</span> <span class="n">model</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LogisticRegressionWithLBFGS</span><span class="o">()</span>
+<span class="kd">final</span> <span class="n">LogisticRegressionModel</span> <span class="n">model</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LogisticRegressionWithLBFGS</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setNumClasses</span><span class="o">(</span><span class="mi">2</span><span class="o">)</span>
   <span class="o">.</span><span class="na">run</span><span class="o">(</span><span class="n">training</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 
@@ -538,7 +538,7 @@ data, and evaluate the performance of the algorithm by several binary evaluation
 
 <span class="c1">// Get evaluation metrics.</span>
 <span class="n">BinaryClassificationMetrics</span> <span class="n">metrics</span> <span class="o">=</span>
-  <span class="k">new</span> <span class="nf">BinaryClassificationMetrics</span><span class="o">(</span><span class="n">predictionAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
+  <span class="k">new</span> <span class="n">BinaryClassificationMetrics</span><span class="o">(</span><span class="n">predictionAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 
 <span class="c1">// Precision by threshold</span>
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;&gt;</span> <span class="n">precision</span> <span class="o">=</span> <span class="n">metrics</span><span class="o">.</span><span class="na">precisionByThreshold</span><span class="o">().</span><span class="na">toJavaRDD</span><span class="o">();</span>
@@ -564,7 +564,7 @@ data, and evaluate the performance of the algorithm by several binary evaluation
   <span class="k">new</span> <span class="n">Function</span><span class="o">&lt;</span><span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;,</span> <span class="n">Double</span><span class="o">&gt;()</span> <span class="o">{</span>
     <span class="nd">@Override</span>
     <span class="kd">public</span> <span class="n">Double</span> <span class="nf">call</span><span class="o">(</span><span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">,</span> <span class="n">Object</span><span class="o">&gt;</span> <span class="n">t</span><span class="o">)</span> <span class="o">{</span>
-      <span class="k">return</span> <span class="k">new</span> <span class="nf">Double</span><span class="o">(</span><span class="n">t</span><span class="o">.</span><span class="na">_1</span><span class="o">().</span><span class="na">toString</span><span class="o">());</span>
+      <span class="k">return</span> <span class="k">new</span> <span class="n">Double</span><span class="o">(</span><span class="n">t</span><span class="o">.</span><span class="na">_1</span><span class="o">().</span><span class="na">toString</span><span class="o">());</span>
     <span class="o">}</span>
   <span class="o">}</span>
 <span class="o">);</span>
@@ -590,34 +590,34 @@ data, and evaluate the performance of the algorithm by several binary evaluation
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.evaluation.BinaryClassificationMetrics"><code>BinaryClassificationMetrics</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.classification.LogisticRegressionWithLBFGS"><code>LogisticRegressionWithLBFGS</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.classification</span> <span class="kn">import</span> <span class="n">LogisticRegressionWithLBFGS</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.classification</span> <span class="kn">import</span> <span class="n">LogisticRegressionWithLBFGS</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.evaluation</span> <span class="kn">import</span> <span class="n">BinaryClassificationMetrics</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.regression</span> <span class="kn">import</span> <span class="n">LabeledPoint</span>
 
-<span class="c"># Several of the methods available in scala are currently missing from pyspark</span>
-<span class="c"># Load training data in LIBSVM format</span>
+<span class="c1"># Several of the methods available in scala are currently missing from pyspark</span>
+<span class="c1"># Load training data in LIBSVM format</span>
 <span class="n">data</span> <span class="o">=</span> <span class="n">spark</span>\
-    <span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_binary_classification_data.txt&quot;</span><span class="p">)</span>\
+    <span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;libsvm&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_binary_classification_data.txt&quot;</span><span class="p">)</span>\
     <span class="o">.</span><span class="n">rdd</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">row</span><span class="p">:</span> <span class="n">LabeledPoint</span><span class="p">(</span><span class="n">row</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">row</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
 
-<span class="c"># Split data into training (60%) and test (40%)</span>
+<span class="c1"># Split data into training (60%) and test (40%)</span>
 <span class="n">training</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.6</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span> <span class="n">seed</span><span class="o">=</span><span class="mi">11</span><span class="p">)</span>
 <span class="n">training</span><span class="o">.</span><span class="n">cache</span><span class="p">()</span>
 
-<span class="c"># Run training algorithm to build the model</span>
+<span class="c1"># Run training algorithm to build the model</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">LogisticRegressionWithLBFGS</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
 
-<span class="c"># Compute raw scores on the test set</span>
+<span class="c1"># Compute raw scores on the test set</span>
 <span class="n">predictionAndLabels</span> <span class="o">=</span> <span class="n">test</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="p">(</span><span class="nb">float</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">lp</span><span class="o">.</span><span class="n">features</span><span class="p">)),</span> <span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">))</span>
 
-<span class="c"># Instantiate metrics object</span>
+<span class="c1"># Instantiate metrics object</span>
 <span class="n">metrics</span> <span class="o">=</span> <span class="n">BinaryClassificationMetrics</span><span class="p">(</span><span class="n">predictionAndLabels</span><span class="p">)</span>
 
-<span class="c"># Area under precision-recall curve</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Area under PR = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">areaUnderPR</span><span class="p">)</span>
+<span class="c1"># Area under precision-recall curve</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Area under PR = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">areaUnderPR</span><span class="p">)</span>
 
-<span class="c"># Area under ROC curve</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Area under ROC = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">areaUnderROC</span><span class="p">)</span>
+<span class="c1"># Area under ROC curve</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Area under ROC = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">areaUnderROC</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/binary_classification_metrics_example.py" in the Spark repo.</small></div>
   </div>
@@ -649,15 +649,15 @@ correctly normalized by the number of times that label appears in the output.</p
 
 <p>Define the class, or label, set as</p>
 
-<script type="math/tex; mode=display">L = \{\ell_0, \ell_1, \ldots, \ell_{M-1} \}</script>
+<script type="math/tex; mode=display">L = \{\ell_0, \ell_1, \ldots, \ell_{M-1} \} </script>
 
 <p>The true output vector $\mathbf{y}$ consists of $N$ elements</p>
 
-<script type="math/tex; mode=display">\mathbf{y}_0, \mathbf{y}_1, \ldots, \mathbf{y}_{N-1} \in L</script>
+<script type="math/tex; mode=display">\mathbf{y}_0, \mathbf{y}_1, \ldots, \mathbf{y}_{N-1} \in L </script>
 
 <p>A multiclass prediction algorithm generates a prediction vector $\hat{\mathbf{y}}$ of $N$ elements</p>
 
-<script type="math/tex; mode=display">\hat{\mathbf{y}}_0, \hat{\mathbf{y}}_1, \ldots, \hat{\mathbf{y}}_{N-1} \in L</script>
+<script type="math/tex; mode=display">\hat{\mathbf{y}}_0, \hat{\mathbf{y}}_1, \ldots, \hat{\mathbf{y}}_{N-1} \in L </script>
 
 <p>For this section, a modified delta function $\hat{\delta}(x)$ will prove useful</p>
 
@@ -731,7 +731,7 @@ the data, and evaluate the performance of the algorithm by several multiclass cl
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.evaluation.MulticlassMetrics"><code>MulticlassMetrics</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.evaluation.MulticlassMetrics</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
@@ -764,34 +764,34 @@ the data, and evaluate the performance of the algorithm by several multiclass cl
 <span class="c1">// Overall Statistics</span>
 <span class="k">val</span> <span class="n">accuracy</span> <span class="k">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">accuracy</span>
 <span class="n">println</span><span class="o">(</span><span class="s">&quot;Summary Statistics&quot;</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Accuracy = $accuracy&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Accuracy = </span><span class="si">$accuracy</span><span class="s">&quot;</span><span class="o">)</span>
 
 <span class="c1">// Precision by label</span>
 <span class="k">val</span> <span class="n">labels</span> <span class="k">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">labels</span>
 <span class="n">labels</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="n">l</span> <span class="k">=&gt;</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Precision($l) = &quot;</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="o">(</span><span class="n">l</span><span class="o">))</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;Precision(</span><span class="si">$l</span><span class="s">) = &quot;</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="o">(</span><span class="n">l</span><span class="o">))</span>
 <span class="o">}</span>
 
 <span class="c1">// Recall by label</span>
 <span class="n">labels</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="n">l</span> <span class="k">=&gt;</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Recall($l) = &quot;</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="o">(</span><span class="n">l</span><span class="o">))</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;Recall(</span><span class="si">$l</span><span class="s">) = &quot;</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="o">(</span><span class="n">l</span><span class="o">))</span>
 <span class="o">}</span>
 
 <span class="c1">// False positive rate by label</span>
 <span class="n">labels</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="n">l</span> <span class="k">=&gt;</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;FPR($l) = &quot;</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">falsePositiveRate</span><span class="o">(</span><span class="n">l</span><span class="o">))</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;FPR(</span><span class="si">$l</span><span class="s">) = &quot;</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">falsePositiveRate</span><span class="o">(</span><span class="n">l</span><span class="o">))</span>
 <span class="o">}</span>
 
 <span class="c1">// F-measure by label</span>
 <span class="n">labels</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="n">l</span> <span class="k">=&gt;</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;F1-Score($l) = &quot;</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">fMeasure</span><span class="o">(</span><span class="n">l</span><span class="o">))</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;F1-Score(</span><span class="si">$l</span><span class="s">) = &quot;</span> <span class="o">+</span> <span class="n">metrics</span><span class="o">.</span><span class="n">fMeasure</span><span class="o">(</span><span class="n">l</span><span class="o">))</span>
 <span class="o">}</span>
 
 <span class="c1">// Weighted stats</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Weighted precision: ${metrics.weightedPrecision}&quot;</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Weighted recall: ${metrics.weightedRecall}&quot;</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Weighted F1 score: ${metrics.weightedFMeasure}&quot;</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Weighted false positive rate: ${metrics.weightedFalsePositiveRate}&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Weighted precision: </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">weightedPrecision</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Weighted recall: </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">weightedRecall</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Weighted F1 score: </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">weightedFMeasure</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Weighted false positive rate: </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">weightedFalsePositiveRate</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala" in the Spark repo.</small></div>
 
@@ -800,7 +800,7 @@ the data, and evaluate the performance of the algorithm by several multiclass cl
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/evaluation/MulticlassMetrics.html"><code>MulticlassMetrics</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.*</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
@@ -820,7 +820,7 @@ the data, and evaluate the performance of the algorithm by several multiclass cl
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">LabeledPoint</span><span class="o">&gt;</span> <span class="n">test</span> <span class="o">=</span> <span class="n">splits</span><span class="o">[</span><span class="mi">1</span><span class="o">];</span>
 
 <span class="c1">// Run training algorithm to build the model.</span>
-<span class="kd">final</span> <span class="n">LogisticRegressionModel</span> <span class="n">model</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LogisticRegressionWithLBFGS</span><span class="o">()</span>
+<span class="kd">final</span> <span class="n">LogisticRegressionModel</span> <span class="n">model</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LogisticRegressionWithLBFGS</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setNumClasses</span><span class="o">(</span><span class="mi">3</span><span class="o">)</span>
   <span class="o">.</span><span class="na">run</span><span class="o">(</span><span class="n">training</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 
@@ -835,7 +835,7 @@ the data, and evaluate the performance of the algorithm by several multiclass cl
 <span class="o">);</span>
 
 <span class="c1">// Get evaluation metrics.</span>
-<span class="n">MulticlassMetrics</span> <span class="n">metrics</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">MulticlassMetrics</span><span class="o">(</span><span class="n">predictionAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
+<span class="n">MulticlassMetrics</span> <span class="n">metrics</span> <span class="o">=</span> <span class="k">new</span> <span class="n">MulticlassMetrics</span><span class="o">(</span><span class="n">predictionAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 
 <span class="c1">// Confusion matrix</span>
 <span class="n">Matrix</span> <span class="n">confusion</span> <span class="o">=</span> <span class="n">metrics</span><span class="o">.</span><span class="na">confusionMatrix</span><span class="o">();</span>
@@ -872,48 +872,48 @@ the data, and evaluate the performance of the algorithm by several multiclass cl
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.evaluation.MulticlassMetrics"><code>MulticlassMetrics</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.classification</span> <span class="kn">import</span> <span class="n">LogisticRegressionWithLBFGS</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.classification</span> <span class="kn">import</span> <span class="n">LogisticRegressionWithLBFGS</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.evaluation</span> <span class="kn">import</span> <span class="n">MulticlassMetrics</span>
 
-<span class="c"># Load training data in LIBSVM format</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;data/mllib/sample_multiclass_classification_data.txt&quot;</span><span class="p">)</span>
+<span class="c1"># Load training data in LIBSVM format</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;data/mllib/sample_multiclass_classification_data.txt&quot;</span><span class="p">)</span>
 
-<span class="c"># Split data into training (60%) and test (40%)</span>
+<span class="c1"># Split data into training (60%) and test (40%)</span>
 <span class="n">training</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.6</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span> <span class="n">seed</span><span class="o">=</span><span class="mi">11</span><span class="p">)</span>
 <span class="n">training</span><span class="o">.</span><span class="n">cache</span><span class="p">()</span>
 
-<span class="c"># Run training algorithm to build the model</span>
+<span class="c1"># Run training algorithm to build the model</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">LogisticRegressionWithLBFGS</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="n">numClasses</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
 
-<span class="c"># Compute raw scores on the test set</span>
+<span class="c1"># Compute raw scores on the test set</span>
 <span class="n">predictionAndLabels</span> <span class="o">=</span> <span class="n">test</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="p">(</span><span class="nb">float</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">lp</span><span class="o">.</span><span class="n">features</span><span class="p">)),</span> <span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">))</span>
 
-<span class="c"># Instantiate metrics object</span>
+<span class="c1"># Instantiate metrics object</span>
 <span class="n">metrics</span> <span class="o">=</span> <span class="n">MulticlassMetrics</span><span class="p">(</span><span class="n">predictionAndLabels</span><span class="p">)</span>
 
-<span class="c"># Overall statistics</span>
+<span class="c1"># Overall statistics</span>
 <span class="n">precision</span> <span class="o">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="p">()</span>
 <span class="n">recall</span> <span class="o">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="p">()</span>
 <span class="n">f1Score</span> <span class="o">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">fMeasure</span><span class="p">()</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Summary Stats&quot;</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Precision = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">precision</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Recall = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">recall</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;F1 Score = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">f1Score</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Summary Stats&quot;</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Precision = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">precision</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Recall = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">recall</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;F1 Score = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">f1Score</span><span class="p">)</span>
 
-<span class="c"># Statistics by class</span>
+<span class="c1"># Statistics by class</span>
 <span class="n">labels</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">)</span><span class="o">.</span><span class="n">distinct</span><span class="p">()</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span>
 <span class="k">for</span> <span class="n">label</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">labels</span><span class="p">):</span>
-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Class </span><span class="si">%s</span><span class="s"> precision = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span>
-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Class </span><span class="si">%s</span><span class="s"> recall = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span>
-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Class </span><span class="si">%s</span><span class="s"> F1 Measure = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">fMeasure</span><span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">beta</span><span class="o">=</span><span class="mf">1.0</span><span class="p">)))</span>
-
-<span class="c"># Weighted stats</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Weighted recall = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedRecall</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Weighted precision = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedPrecision</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Weighted F(1) Score = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedFMeasure</span><span class="p">())</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Weighted F(0.5) Score = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedFMeasure</span><span class="p">(</span><span class="n">beta</span><span class="o">=</span><span class="mf">0.5</span><span class="p">))</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Weighted false positive rate = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedFalsePositiveRate</span><span class="p">)</span>
+    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;Class </span><span class="si">%s</span><span class="s2"> precision = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span>
+    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;Class </span><span class="si">%s</span><span class="s2"> recall = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span>
+    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;Class </span><span class="si">%s</span><span class="s2"> F1 Measure = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">fMeasure</span><span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">beta</span><span class="o">=</span><span class="mf">1.0</span><span class="p">)))</span>
+
+<span class="c1"># Weighted stats</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Weighted recall = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedRecall</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Weighted precision = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedPrecision</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Weighted F(1) Score = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedFMeasure</span><span class="p">())</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Weighted F(0.5) Score = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedFMeasure</span><span class="p">(</span><span class="n">beta</span><span class="o">=</span><span class="mf">0.5</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Weighted false positive rate = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">weightedFalsePositiveRate</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/multi_class_metrics_example.py" in the Spark repo.</small></div>
 
@@ -938,7 +938,7 @@ set and it exists in the true label set, for a specific data point.</p>
 
 <script type="math/tex; mode=display">D = \left\{d_0, d_1, ..., d_{N-1}\right\}</script>
 
-<p>Define $L_0, L_1, &#8230;, L_{N-1}$ to be a family of label sets and $P_0, P_1, &#8230;, P_{N-1}$
+<p>Define $L_0, L_1, &#8230;, L<em>{N-1}$ to be a family of label sets and $P_0, P_1, &#8230;, P</em>{N-1}$
 to be a family of prediction sets where $L_i$ and $P_i$ are the label set and prediction set, respectively, that
 correspond to document $d_i$.</p>
 
@@ -1058,7 +1058,7 @@ use the fake prediction and label data for multilabel classification that is sho
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.evaluation.MultilabelMetrics"><code>MultilabelMetrics</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.evaluation.MultilabelMetrics</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.evaluation.MultilabelMetrics</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.rdd.RDD</span>
 
 <span class="k">val</span> <span class="n">scoreAndLabels</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[(</span><span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">]</span>, <span class="kt">Array</span><span class="o">[</span><span class="kt">Double</span><span class="o">])]</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span>
@@ -1074,27 +1074,27 @@ use the fake prediction and label data for multilabel classification that is sho
 <span class="k">val</span> <span class="n">metrics</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">MultilabelMetrics</span><span class="o">(</span><span class="n">scoreAndLabels</span><span class="o">)</span>
 
 <span class="c1">// Summary stats</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Recall = ${metrics.recall}&quot;</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Precision = ${metrics.precision}&quot;</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;F1 measure = ${metrics.f1Measure}&quot;</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Accuracy = ${metrics.accuracy}&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Recall = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Precision = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;F1 measure = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">f1Measure</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Accuracy = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">accuracy</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 
 <span class="c1">// Individual label stats</span>
 <span class="n">metrics</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">label</span> <span class="k">=&gt;</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Class $label precision = ${metrics.precision(label)}&quot;</span><span class="o">))</span>
-<span class="n">metrics</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">label</span> <span class="k">=&gt;</span> <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Class $label recall = ${metrics.recall(label)}&quot;</span><span class="o">))</span>
-<span class="n">metrics</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">label</span> <span class="k">=&gt;</span> <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Class $label F1-score = ${metrics.f1Measure(label)}&quot;</span><span class="o">))</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;Class </span><span class="si">$label</span><span class="s"> precision = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="o">(</span><span class="n">label</span><span class="o">)</span><span class="si">}</span><span class="s">&quot;</span><span class="o">))</span>
+<span class="n">metrics</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">label</span> <span class="k">=&gt;</span> <span class="n">println</span><span class="o">(</span><span class="s">s&quot;Class </span><span class="si">$label</span><span class="s"> recall = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="o">(</span><span class="n">label</span><span class="o">)</span><span class="si">}</span><span class="s">&quot;</span><span class="o">))</span>
+<span class="n">metrics</span><span class="o">.</span><span class="n">labels</span><span class="o">.</span><span class="n">foreach</span><span class="o">(</span><span class="n">label</span> <span class="k">=&gt;</span> <span class="n">println</span><span class="o">(</span><span class="s">s&quot;Class </span><span class="si">$label</span><span class="s"> F1-score = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">f1Measure</span><span class="o">(</span><span class="n">label</span><span class="o">)</span><span class="si">}</span><span class="s">&quot;</span><span class="o">))</span>
 
 <span class="c1">// Micro stats</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Micro recall = ${metrics.microRecall}&quot;</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Micro precision = ${metrics.microPrecision}&quot;</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Micro F1 measure = ${metrics.microF1Measure}&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Micro recall = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">microRecall</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Micro precision = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">microPrecision</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Micro F1 measure = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">microF1Measure</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 
 <span class="c1">// Hamming loss</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Hamming loss = ${metrics.hammingLoss}&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Hamming loss = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">hammingLoss</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 
 <span class="c1">// Subset accuracy</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Subset accuracy = ${metrics.subsetAccuracy}&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Subset accuracy = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">subsetAccuracy</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala" in the Spark repo.</small></div>
 
@@ -1103,7 +1103,7 @@ use the fake prediction and label data for multilabel classification that is sho
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/evaluation/MultilabelMetrics.html"><code>MultilabelMetrics</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.List</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
@@ -1124,7 +1124,7 @@ use the fake prediction and label data for multilabel classification that is sho
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Tuple2</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">[],</span> <span class="kt">double</span><span class="o">[]&gt;&gt;</span> <span class="n">scoreAndLabels</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">parallelize</span><span class="o">(</span><span class="n">data</span><span class="o">);</span>
 
 <span class="c1">// Instantiate metrics object</span>
-<span class="n">MultilabelMetrics</span> <span class="n">metrics</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">MultilabelMetrics</span><span class="o">(</span><span class="n">scoreAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
+<span class="n">MultilabelMetrics</span> <span class="n">metrics</span> <span class="o">=</span> <span class="k">new</span> <span class="n">MultilabelMetrics</span><span class="o">(</span><span class="n">scoreAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 
 <span class="c1">// Summary stats</span>
 <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">format</span><span class="o">(</span><span class="s">&quot;Recall = %f\n&quot;</span><span class="o">,</span> <span class="n">metrics</span><span class="o">.</span><span class="na">recall</span><span class="o">());</span>
@@ -1160,7 +1160,7 @@ use the fake prediction and label data for multilabel classification that is sho
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.evaluation.MultilabelMetrics"><code>MultilabelMetrics</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.evaluation</span> <span class="kn">import</span> <span class="n">MultilabelMetrics</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.evaluation</span> <span class="kn">import</span> <span class="n">MultilabelMetrics</span>
 
 <span class="n">scoreAndLabels</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([</span>
     <span class="p">([</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">],</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">]),</span>
@@ -1171,32 +1171,32 @@ use the fake prediction and label data for multilabel classification that is sho
     <span class="p">([</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">],</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">]),</span>
     <span class="p">([</span><span class="mf">1.0</span><span class="p">],</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">])])</span>
 
-<span class="c"># Instantiate metrics object</span>
+<span class="c1"># Instantiate metrics object</span>
 <span class="n">metrics</span> <span class="o">=</span> <span class="n">MultilabelMetrics</span><span class="p">(</span><span class="n">scoreAndLabels</span><span class="p">)</span>
 
-<span class="c"># Summary stats</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Recall = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="p">())</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Precision = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="p">())</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;F1 measure = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">f1Measure</span><span class="p">())</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Accuracy = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">accuracy</span><span class="p">)</span>
+<span class="c1"># Summary stats</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Recall = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="p">())</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Precision = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="p">())</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;F1 measure = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">f1Measure</span><span class="p">())</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Accuracy = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">accuracy</span><span class="p">)</span>
 
-<span class="c"># Individual label stats</span>
+<span class="c1"># Individual label stats</span>
 <span class="n">labels</span> <span class="o">=</span> <span class="n">scoreAndLabels</span><span class="o">.</span><span class="n">flatMap</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span><span class="o">.</span><span class="n">distinct</span><span class="p">()</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span>
 <span class="k">for</span> <span class="n">label</span> <span class="ow">in</span> <span class="n">labels</span><span class="p">:</span>
-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Class </span><span class="si">%s</span><span class="s"> precision = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span>
-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Class </span><span class="si">%s</span><span class="s"> recall = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span>
-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Class </span><span class="si">%s</span><span class="s"> F1 Measure = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">f1Measure</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span>
+    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;Class </span><span class="si">%s</span><span class="s2"> precision = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">precision</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span>
+    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;Class </span><span class="si">%s</span><span class="s2"> recall = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">recall</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span>
+    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;Class </span><span class="si">%s</span><span class="s2"> F1 Measure = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">metrics</span><span class="o">.</span><span class="n">f1Measure</span><span class="p">(</span><span class="n">label</span><span class="p">)))</span>
 
-<span class="c"># Micro stats</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Micro precision = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">microPrecision</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Micro recall = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">microRecall</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Micro F1 measure = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">microF1Measure</span><span class="p">)</span>
+<span class="c1"># Micro stats</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Micro precision = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">microPrecision</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Micro recall = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">microRecall</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Micro F1 measure = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">microF1Measure</span><span class="p">)</span>
 
-<span class="c"># Hamming loss</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Hamming loss = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">hammingLoss</span><span class="p">)</span>
+<span class="c1"># Hamming loss</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Hamming loss = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">hammingLoss</span><span class="p">)</span>
 
-<span class="c"># Subset accuracy</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Subset accuracy = </span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">subsetAccuracy</span><span class="p">)</span>
+<span class="c1"># Subset accuracy</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Subset accuracy = </span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">metrics</span><span class="o">.</span><span class="n">subsetAccuracy</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/multi_label_metrics_example.py" in the Spark repo.</small></div>
 
@@ -1317,7 +1317,7 @@ expanded world of non-positive weights are &#8220;the same as never having inter
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.evaluation.RegressionMetrics"><code>RegressionMetrics</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.evaluation.RankingMetrics"><code>RankingMetrics</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.evaluation.</span><span class="o">{</span><span class="nc">RankingMetrics</span><span class="o">,</span> <span class="nc">RegressionMetrics</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.evaluation.</span><span class="o">{</span><span class="nc">RankingMetrics</span><span class="o">,</span> <span class="nc">RegressionMetrics</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.recommendation.</span><span class="o">{</span><span class="nc">ALS</span><span class="o">,</span> <span class="nc">Rating</span><span class="o">}</span>
 
 <span class="c1">// Read in the ratings data</span>
@@ -1334,7 +1334,7 @@ expanded world of non-positive weights are &#8220;the same as never having inter
 <span class="k">val</span> <span class="n">numRatings</span> <span class="k">=</span> <span class="n">ratings</span><span class="o">.</span><span class="n">count</span><span class="o">()</span>
 <span class="k">val</span> <span class="n">numUsers</span> <span class="k">=</span> <span class="n">ratings</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">user</span><span class="o">).</span><span class="n">distinct</span><span class="o">().</span><span class="n">count</span><span class="o">()</span>
 <span class="k">val</span> <span class="n">numMovies</span> <span class="k">=</span> <span class="n">ratings</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">product</span><span class="o">).</span><span class="n">distinct</span><span class="o">().</span><span class="n">count</span><span class="o">()</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Got $numRatings ratings from $numUsers users on $numMovies movies.&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Got </span><span class="si">$numRatings</span><span class="s"> ratings from </span><span class="si">$numUsers</span><span class="s"> users on </span><span class="si">$numMovies</span><span class="s"> movies.&quot;</span><span class="o">)</span>
 
 <span class="c1">// Build the model</span>
 <span class="k">val</span> <span class="n">numIterations</span> <span class="k">=</span> <span class="mi">10</span>
@@ -1366,15 +1366,15 @@ expanded world of non-positive weights are &#8220;the same as never having inter
 
 <span class="c1">// Precision at K</span>
 <span class="nc">Array</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">3</span><span class="o">,</span> <span class="mi">5</span><span class="o">).</span><span class="n">foreach</span> <span class="o">{</span> <span class="n">k</span> <span class="k">=&gt;</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Precision at $k = ${metrics.precisionAt(k)}&quot;</span><span class="o">)</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;Precision at </span><span class="si">$k</span><span class="s"> = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">precisionAt</span><span class="o">(</span><span class="n">k</span><span class="o">)</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 <span class="o">}</span>
 
 <span class="c1">// Mean average precision</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Mean average precision = ${metrics.meanAveragePrecision}&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Mean average precision = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">meanAveragePrecision</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 
 <span class="c1">// Normalized discounted cumulative gain</span>
 <span class="nc">Array</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">3</span><span class="o">,</span> <span class="mi">5</span><span class="o">).</span><span class="n">foreach</span> <span class="o">{</span> <span class="n">k</span> <span class="k">=&gt;</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;NDCG at $k = ${metrics.ndcgAt(k)}&quot;</span><span class="o">)</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;NDCG at </span><span class="si">$k</span><span class="s"> = </span><span class="si">${</span><span class="n">metrics</span><span class="o">.</span><span class="n">ndcgAt</span><span class="o">(</span><span class="n">k</span><span class="o">)</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 <span class="o">}</span>
 
 <span class="c1">// Get predictions for each data point</span>
@@ -1388,10 +1388,10 @@ expanded world of non-positive weights are &#8220;the same as never having inter
 
 <span class="c1">// Get the RMSE using regression metrics</span>
 <span class="k">val</span> <span class="n">regressionMetrics</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">RegressionMetrics</span><span class="o">(</span><span class="n">predictionsAndLabels</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;RMSE = ${regressionMetrics.rootMeanSquaredError}&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;RMSE = </span><span class="si">${</span><span class="n">regressionMetrics</span><span class="o">.</span><span class="n">rootMeanSquaredError</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 
 <span class="c1">// R-squared</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;R-squared = ${regressionMetrics.r2}&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;R-squared = </span><span class="si">${</span><span class="n">regressionMetrics</span><span class="o">.</span><span class="n">r2</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/RankingMetricsExample.scala" in the Spark repo.</small></div>
 
@@ -1400,7 +1400,7 @@ expanded world of non-positive weights are &#8220;the same as never having inter
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/evaluation/RegressionMetrics.html"><code>RegressionMetrics</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/evaluation/RankingMetrics.html"><code>RankingMetrics</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.*</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.*</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
 
@@ -1419,7 +1419,7 @@ expanded world of non-positive weights are &#8220;the same as never having inter
     <span class="nd">@Override</span>
     <span class="kd">public</span> <span class="n">Rating</span> <span class="nf">call</span><span class="o">(</span><span class="n">String</span> <span class="n">line</span><span class="o">)</span> <span class="o">{</span>
       <span class="n">String</span><span class="o">[]</span> <span class="n">parts</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="na">split</span><span class="o">(</span><span class="s">&quot;::&quot;</span><span class="o">);</span>
-        <span class="k">return</span> <span class="k">new</span> <span class="nf">Rating</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="mi">0</span><span class="o">]),</span> <span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="mi">1</span><span class="o">]),</span> <span class="n">Double</span>
+        <span class="k">return</span> <span class="k">new</span> <span class="n">Rating</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="mi">0</span><span class="o">]),</span> <span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="mi">1</span><span class="o">]),</span> <span class="n">Double</span>
           <span class="o">.</span><span class="na">parseDouble</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="mi">2</span><span class="o">])</span> <span class="o">-</span> <span class="mf">2.5</span><span class="o">);</span>
     <span class="o">}</span>
   <span class="o">}</span>
@@ -1438,7 +1438,7 @@ expanded world of non-positive weights are &#8220;the same as never having inter
       <span class="n">Rating</span><span class="o">[]</span> <span class="n">scaledRatings</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Rating</span><span class="o">[</span><span class="n">t</span><span class="o">.</span><span class="na">_2</span><span class="o">().</span><span class="na">length</span><span class="o">];</span>
       <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">scaledRatings</span><span class="o">.</span><span class="na">length</span><span class="o">;</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span>
         <span class="kt">double</span> <span class="n">newRating</span> <span class="o">=</span> <span class="n">Math</span><span class="o">.</span><span class="na">max</span><span class="o">(</span><span class="n">Math</span><span class="o">.</span><span class="na">min</span><span class="o">(</span><span class="n">t</span><span class="o">.</span><span class="na">_2</span><span class="o">()[</span><span class="n">i</span><span class="o">].</span><span class="na">rating</span><span class="o">(),</span> <span class="mf">1.0</span><span class="o">),</span> <span class="mf">0.0</span><span class="o">);</span>
-        <span class="n">scaledRatings</span><span class="o">[</span><span class="n">i</span><span class="o">]</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Rating</span><span class="o">(</span><span class="n">t</span><span class="o">.</span><span class="na">_2</span><span class="o">()[</span><span class="n">i</span><span class="o">].</span><span class="na">user</span><span class="o">(),</span> <span class="n">t</span><span class="o">.</span><span class="na">_2</span><span class="o">()[</span><span class="n">i</span><span class="o">].</span><span class="na">product</span><span class="o">(),</span> <span class="n">newRating</span><span class="o">);</span>
+        <span class="n">scaledRatings</span><span class="o">[</span><span class="n">i</span><span class="o">]</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Rating</span><span class="o">(</span><span class="n">t</span><span class="o">.</span><span class="na">_2</span><span class="o">()[</span><span class="n">i</span><span class="o">].</span><span class="na">user</span><span class="o">(),</span> <span class="n">t</span><span class="o">.</span><span class="na">_2</span><span class="o">()[</span><span class="n">i</span><span class="o">].</span><span class="na">product</span><span class="o">(),</span> <span class="n">newRating</span><span class="o">);</span>
       <span class="o">}</span>
       <span class="k">return</span> <span class="k">new</span> <span class="n">Tuple2</span><span class="o">&lt;&gt;(</span><span class="n">t</span><span class="o">.</span><span class="na">_1</span><span class="o">(),</span> <span class="n">scaledRatings</span><span class="o">);</span>
     <span class="o">}</span>
@@ -1457,7 +1457,7 @@ expanded world of non-positive weights are &#8220;the same as never having inter
       <span class="o">}</span> <span class="k">else</span> <span class="o">{</span>
         <span class="n">binaryRating</span> <span class="o">=</span> <span class="mf">0.0</span><span class="o">;</span>
       <span class="o">}</span>
-      <span class="k">return</span> <span class="k">new</span> <span class="nf">Rating</span><span class="o">(</span><span class="n">r</span><span class="o">.</span><span class="na">user</span><span class="o">(),</span> <span class="n">r</span><span class="o">.</span><span class="na">product</span><span class="o">(),</span> <span class="n">binaryRating</span><span class="o">);</span>
+      <span class="k">return</span> <span class="k">new</span> <span class="n">Rating</span><span class="o">(</span><span class="n">r</span><span class="o">.</span><span class="na">user</span><span class="o">(),</span> <span class="n">r</span><span class="o">.</span><span class="na">product</span><span class="o">(),</span> <span class="n">binaryRating</span><span class="o">);</span>
     <span class="o">}</span>
   <span class="o">}</span>
 <span class="o">);</span>
@@ -1548,7 +1548,7 @@ expanded world of non-positive weights are &#8220;the same as never having inter
   <span class="o">)).</span><span class="na">join</span><span class="o">(</span><span class="n">predictions</span><span class="o">).</span><span class="na">values</span><span class="o">();</span>
 
 <span class="c1">// Create regression metrics object</span>
-<span class="n">RegressionMetrics</span> <span class="n">regressionMetrics</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">RegressionMetrics</span><span class="o">(</span><span class="n">ratesAndPreds</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
+<span class="n">RegressionMetrics</span> <span class="n">regressionMetrics</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RegressionMetrics</span><span class="o">(</span><span class="n">ratesAndPreds</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 
 <span class="c1">// Root mean squared error</span>
 <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">format</span><span class="o">(</span><span class="s">&quot;RMSE = %f\n&quot;</span><span class="o">,</span> <span class="n">regressionMetrics</span><span class="o">.</span><span class="na">rootMeanSquaredError</span><span class="o">());</span>
@@ -1563,35 +1563,35 @@ expanded world of non-positive weights are &#8220;the same as never having inter
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.evaluation.RegressionMetrics"><code>RegressionMetrics</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.evaluation.RankingMetrics"><code>RankingMetrics</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.recommendation</span> <span class="kn">import</span> <span class="n">ALS</span><span class="p">,</span> <span class="n">Rating</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.recommendation</span> <span class="kn">import</span> <span class="n">ALS</span><span class="p">,</span> <span class="n">Rating</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.evaluation</span> <span class="kn">import</span> <span class="n">RegressionMetrics</span><span class="p">,</span> <span class="n">RankingMetrics</span>
 
-<span class="c"># Read in the ratings data</span>
-<span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_movielens_data.txt&quot;</span><span class="p">)</span>
+<span class="c1"># Read in the ratings data</span>
+<span class="n">lines</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_movielens_data.txt&quot;</span><span class="p">)</span>
 
 <span class="k">def</span> <span class="nf">parseLine</span><span class="p">(</span><span class="n">line</span><span class="p">):</span>
-    <span class="n">fields</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot;::&quot;</span><span class="p">)</span>
+    <span class="n">fields</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot;::&quot;</span><span class="p">)</span>
     <span class="k">return</span> <span class="n">Rating</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">fields</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="nb">int</span><span class="p">(</span><span class="n">fields</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="nb">float</span><span class="p">(</span><span class="n">fields</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span> <span class="o">-</span> <span class="mf">2.5</span><span class="p">)</span>
 <span class="n">ratings</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">r</span><span class="p">:</span> <span class="n">parseLine</span><span class="p">(</span><span class="n">r</span><span class="p">))</span>
 
-<span class="c"># Train a model on to predict user-product ratings</span>
+<span class="c1"># Train a model on to predict user-product ratings</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">ALS</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">ratings</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mf">0.01</span><span class="p">)</span>
 
-<span class="c"># Get predicted ratings on all existing user-product pairs</span>
+<span class="c1"># Get predicted ratings on all existing user-product pairs</span>
 <span class="n">testData</span> <span class="o">=</span> <span class="n">ratings</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="p">(</span><span class="n">p</span><span class="o">.</span><span class="n">user</span><span class="p">,</span> <span class="n">p</span><span class="o">.</span><span class="n">product</span><span class="p">))</span>
 <span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predictAll</span><span class="p">(</span><span class="n">testData</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">r</span><span class="p">:</span> <span class="p">((</span><span class="n">r</span><span class="o">.</span><span class="n">user</span><span class="p">,</span> <span class="n">r</span><span class="o">.</span><span class="n">product</span><span class="p">),</span> <span class="n">r</span><span class="o">.</span><span class="n">rating</span><span class="p">))</span>
 
 <span class="n">ratingsTuple</span>

<TRUNCATED>

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[07/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/sparkr.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/sparkr.html b/site/docs/2.1.0/sparkr.html
index 0a1a347..e861a01 100644
--- a/site/docs/2.1.0/sparkr.html
+++ b/site/docs/2.1.0/sparkr.html
@@ -127,53 +127,53 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#overview" id="markdown-toc-overview">Overview</a></li>
-  <li><a href="#sparkdataframe" id="markdown-toc-sparkdataframe">SparkDataFrame</a>    <ul>
-      <li><a href="#starting-up-sparksession" id="markdown-toc-starting-up-sparksession">Starting Up: SparkSession</a></li>
-      <li><a href="#starting-up-from-rstudio" id="markdown-toc-starting-up-from-rstudio">Starting Up from RStudio</a></li>
-      <li><a href="#creating-sparkdataframes" id="markdown-toc-creating-sparkdataframes">Creating SparkDataFrames</a>        <ul>
-          <li><a href="#from-local-data-frames" id="markdown-toc-from-local-data-frames">From local data frames</a></li>
-          <li><a href="#from-data-sources" id="markdown-toc-from-data-sources">From Data Sources</a></li>
-          <li><a href="#from-hive-tables" id="markdown-toc-from-hive-tables">From Hive tables</a></li>
+  <li><a href="#overview">Overview</a></li>
+  <li><a href="#sparkdataframe">SparkDataFrame</a>    <ul>
+      <li><a href="#starting-up-sparksession">Starting Up: SparkSession</a></li>
+      <li><a href="#starting-up-from-rstudio">Starting Up from RStudio</a></li>
+      <li><a href="#creating-sparkdataframes">Creating SparkDataFrames</a>        <ul>
+          <li><a href="#from-local-data-frames">From local data frames</a></li>
+          <li><a href="#from-data-sources">From Data Sources</a></li>
+          <li><a href="#from-hive-tables">From Hive tables</a></li>
         </ul>
       </li>
-      <li><a href="#sparkdataframe-operations" id="markdown-toc-sparkdataframe-operations">SparkDataFrame Operations</a>        <ul>
-          <li><a href="#selecting-rows-columns" id="markdown-toc-selecting-rows-columns">Selecting rows, columns</a></li>
-          <li><a href="#grouping-aggregation" id="markdown-toc-grouping-aggregation">Grouping, Aggregation</a></li>
-          <li><a href="#operating-on-columns" id="markdown-toc-operating-on-columns">Operating on Columns</a></li>
-          <li><a href="#applying-user-defined-function" id="markdown-toc-applying-user-defined-function">Applying User-Defined Function</a>            <ul>
-              <li><a href="#run-a-given-function-on-a-large-dataset-using-dapply-or-dapplycollect" id="markdown-toc-run-a-given-function-on-a-large-dataset-using-dapply-or-dapplycollect">Run a given function on a large dataset using <code>dapply</code> or <code>dapplyCollect</code></a>                <ul>
-                  <li><a href="#dapply" id="markdown-toc-dapply">dapply</a></li>
-                  <li><a href="#dapplycollect" id="markdown-toc-dapplycollect">dapplyCollect</a></li>
+      <li><a href="#sparkdataframe-operations">SparkDataFrame Operations</a>        <ul>
+          <li><a href="#selecting-rows-columns">Selecting rows, columns</a></li>
+          <li><a href="#grouping-aggregation">Grouping, Aggregation</a></li>
+          <li><a href="#operating-on-columns">Operating on Columns</a></li>
+          <li><a href="#applying-user-defined-function">Applying User-Defined Function</a>            <ul>
+              <li><a href="#run-a-given-function-on-a-large-dataset-using-dapply-or-dapplycollect">Run a given function on a large dataset using <code>dapply</code> or <code>dapplyCollect</code></a>                <ul>
+                  <li><a href="#dapply">dapply</a></li>
+                  <li><a href="#dapplycollect">dapplyCollect</a></li>
                 </ul>
               </li>
-              <li><a href="#run-a-given-function-on-a-large-dataset-grouping-by-input-columns-and-using-gapply-or-gapplycollect" id="markdown-toc-run-a-given-function-on-a-large-dataset-grouping-by-input-columns-and-using-gapply-or-gapplycollect">Run a given function on a large dataset grouping by input column(s) and using <code>gapply</code> or <code>gapplyCollect</code></a>                <ul>
-                  <li><a href="#gapply" id="markdown-toc-gapply">gapply</a></li>
-                  <li><a href="#gapplycollect" id="markdown-toc-gapplycollect">gapplyCollect</a></li>
+              <li><a href="#run-a-given-function-on-a-large-dataset-grouping-by-input-columns-and-using-gapply-or-gapplycollect">Run a given function on a large dataset grouping by input column(s) and using <code>gapply</code> or <code>gapplyCollect</code></a>                <ul>
+                  <li><a href="#gapply">gapply</a></li>
+                  <li><a href="#gapplycollect">gapplyCollect</a></li>
                 </ul>
               </li>
-              <li><a href="#data-type-mapping-between-r-and-spark" id="markdown-toc-data-type-mapping-between-r-and-spark">Data type mapping between R and Spark</a></li>
-              <li><a href="#run-local-r-functions-distributed-using-sparklapply" id="markdown-toc-run-local-r-functions-distributed-using-sparklapply">Run local R functions distributed using <code>spark.lapply</code></a>                <ul>
-                  <li><a href="#sparklapply" id="markdown-toc-sparklapply">spark.lapply</a></li>
+              <li><a href="#data-type-mapping-between-r-and-spark">Data type mapping between R and Spark</a></li>
+              <li><a href="#run-local-r-functions-distributed-using-sparklapply">Run local R functions distributed using <code>spark.lapply</code></a>                <ul>
+                  <li><a href="#sparklapply">spark.lapply</a></li>
                 </ul>
               </li>
             </ul>
           </li>
         </ul>
       </li>
-      <li><a href="#running-sql-queries-from-sparkr" id="markdown-toc-running-sql-queries-from-sparkr">Running SQL Queries from SparkR</a></li>
+      <li><a href="#running-sql-queries-from-sparkr">Running SQL Queries from SparkR</a></li>
     </ul>
   </li>
-  <li><a href="#machine-learning" id="markdown-toc-machine-learning">Machine Learning</a>    <ul>
-      <li><a href="#algorithms" id="markdown-toc-algorithms">Algorithms</a></li>
-      <li><a href="#model-persistence" id="markdown-toc-model-persistence">Model persistence</a></li>
+  <li><a href="#machine-learning">Machine Learning</a>    <ul>
+      <li><a href="#algorithms">Algorithms</a></li>
+      <li><a href="#model-persistence">Model persistence</a></li>
     </ul>
   </li>
-  <li><a href="#r-function-name-conflicts" id="markdown-toc-r-function-name-conflicts">R Function Name Conflicts</a></li>
-  <li><a href="#migration-guide" id="markdown-toc-migration-guide">Migration Guide</a>    <ul>
-      <li><a href="#upgrading-from-sparkr-15x-to-16x" id="markdown-toc-upgrading-from-sparkr-15x-to-16x">Upgrading From SparkR 1.5.x to 1.6.x</a></li>
-      <li><a href="#upgrading-from-sparkr-16x-to-20" id="markdown-toc-upgrading-from-sparkr-16x-to-20">Upgrading From SparkR 1.6.x to 2.0</a></li>
-      <li><a href="#upgrading-to-sparkr-210" id="markdown-toc-upgrading-to-sparkr-210">Upgrading to SparkR 2.1.0</a></li>
+  <li><a href="#r-function-name-conflicts">R Function Name Conflicts</a></li>
+  <li><a href="#migration-guide">Migration Guide</a>    <ul>
+      <li><a href="#upgrading-from-sparkr-15x-to-16x">Upgrading From SparkR 1.5.x to 1.6.x</a></li>
+      <li><a href="#upgrading-from-sparkr-16x-to-20">Upgrading From SparkR 1.6.x to 2.0</a></li>
+      <li><a href="#upgrading-to-sparkr-210">Upgrading to SparkR 2.1.0</a></li>
     </ul>
   </li>
 </ul>
@@ -202,7 +202,7 @@ You can create a <code>SparkSession</code> using <code>sparkR.session</code> and
 
   <div data-lang="r">
 
-    <div class="highlight"><pre><code class="language-r" data-lang="r">sparkR.session<span class="p">()</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span>sparkR.session<span class="p">()</span></code></pre></figure>
 
   </div>
 
@@ -223,11 +223,11 @@ them, pass them as you would other configuration properties in the <code>sparkCo
 
   <div data-lang="r">
 
-    <div class="highlight"><pre><code class="language-r" data-lang="r"><span class="kr">if</span> <span class="p">(</span><span class="kp">nchar</span><span class="p">(</span><span class="kp">Sys.getenv</span><span class="p">(</span><span class="s">&quot;SPARK_HOME&quot;</span><span class="p">))</span> <span class="o">&lt;</span> <span class="m">1</span><span class="p">)</span> <span class="p">{</span>
+    <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="kr">if</span> <span class="p">(</span><span class="kp">nchar</span><span class="p">(</span><span class="kp">Sys.getenv</span><span class="p">(</span><span class="s">&quot;SPARK_HOME&quot;</span><span class="p">))</span> <span class="o">&lt;</span> <span class="m">1</span><span class="p">)</span> <span class="p">{</span>
   <span class="kp">Sys.setenv</span><span class="p">(</span>SPARK_HOME <span class="o">=</span> <span class="s">&quot;/home/spark&quot;</span><span class="p">)</span>
 <span class="p">}</span>
 <span class="kn">library</span><span class="p">(</span>SparkR<span class="p">,</span> lib.loc <span class="o">=</span> <span class="kt">c</span><span class="p">(</span><span class="kp">file.path</span><span class="p">(</span><span class="kp">Sys.getenv</span><span class="p">(</span><span class="s">&quot;SPARK_HOME&quot;</span><span class="p">),</span> <span class="s">&quot;R&quot;</span><span class="p">,</span> <span class="s">&quot;lib&quot;</span><span class="p">)))</span>
-sparkR.session<span class="p">(</span>master <span class="o">=</span> <span class="s">&quot;local[*]&quot;</span><span class="p">,</span> sparkConfig <span class="o">=</span> <span class="kt">list</span><span class="p">(</span>spark.driver.memory <span class="o">=</span> <span class="s">&quot;2g&quot;</span><span class="p">))</span></code></pre></div>
+sparkR.session<span class="p">(</span>master <span class="o">=</span> <span class="s">&quot;local[*]&quot;</span><span class="p">,</span> sparkConfig <span class="o">=</span> <span class="kt">list</span><span class="p">(</span>spark.driver.memory <span class="o">=</span> <span class="s">&quot;2g&quot;</span><span class="p">))</span></code></pre></figure>
 
   </div>
 
@@ -282,14 +282,14 @@ sparkR.session<span class="p">(</span>master <span class="o">=</span> <span clas
 
 <div data-lang="r">
 
-  <div class="highlight"><pre><code class="language-r" data-lang="r">df <span class="o">&lt;-</span> as.DataFrame<span class="p">(</span>faithful<span class="p">)</span>
+  <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span>df <span class="o">&lt;-</span> as.DataFrame<span class="p">(</span>faithful<span class="p">)</span>
 
 <span class="c1"># Displays the first part of the SparkDataFrame</span>
 <span class="kp">head</span><span class="p">(</span>df<span class="p">)</span>
 <span class="c1">##  eruptions waiting</span>
 <span class="c1">##1     3.600      79</span>
 <span class="c1">##2     1.800      54</span>
-<span class="c1">##3     3.333      74</span></code></pre></div>
+<span class="c1">##3     3.333      74</span></code></pre></figure>
 
 </div>
 
@@ -303,7 +303,7 @@ specifying <code>--packages</code> with <code>spark-submit</code> or <code>spark
 
 <div data-lang="r">
 
-  <div class="highlight"><pre><code class="language-r" data-lang="r">sparkR.session<span class="p">(</span>sparkPackages <span class="o">=</span> <span class="s">&quot;com.databricks:spark-avro_2.11:3.0.0&quot;</span><span class="p">)</span></code></pre></div>
+  <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span>sparkR.session<span class="p">(</span>sparkPackages <span class="o">=</span> <span class="s">&quot;com.databricks:spark-avro_2.11:3.0.0&quot;</span><span class="p">)</span></code></pre></figure>
 
 </div>
 
@@ -311,7 +311,7 @@ specifying <code>--packages</code> with <code>spark-submit</code> or <code>spark
 
 <div data-lang="r">
 
-  <div class="highlight"><pre><code class="language-r" data-lang="r">people <span class="o">&lt;-</span> read.df<span class="p">(</span><span class="s">&quot;./examples/src/main/resources/people.json&quot;</span><span class="p">,</span> <span class="s">&quot;json&quot;</span><span class="p">)</span>
+  <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span>people <span class="o">&lt;-</span> read.df<span class="p">(</span><span class="s">&quot;./examples/src/main/resources/people.json&quot;</span><span class="p">,</span> <span class="s">&quot;json&quot;</span><span class="p">)</span>
 <span class="kp">head</span><span class="p">(</span>people<span class="p">)</span>
 <span class="c1">##  age    name</span>
 <span class="c1">##1  NA Michael</span>
@@ -325,7 +325,7 @@ printSchema<span class="p">(</span>people<span class="p">)</span>
 <span class="c1">#  |-- name: string (nullable = true)</span>
 
 <span class="c1"># Similarly, multiple files can be read with read.json</span>
-people <span class="o">&lt;-</span> read.json<span class="p">(</span><span class="kt">c</span><span class="p">(</span><span class="s">&quot;./examples/src/main/resources/people.json&quot;</span><span class="p">,</span> <span class="s">&quot;./examples/src/main/resources/people2.json&quot;</span><span class="p">))</span></code></pre></div>
+people <span class="o">&lt;-</span> read.json<span class="p">(</span><span class="kt">c</span><span class="p">(</span><span class="s">&quot;./examples/src/main/resources/people.json&quot;</span><span class="p">,</span> <span class="s">&quot;./examples/src/main/resources/people2.json&quot;</span><span class="p">))</span></code></pre></figure>
 
 </div>
 
@@ -333,7 +333,7 @@ people <span class="o">&lt;-</span> read.json<span class="p">(</span><span class
 
 <div data-lang="r">
 
-  <div class="highlight"><pre><code class="language-r" data-lang="r">df <span class="o">&lt;-</span> read.df<span class="p">(</span>csvPath<span class="p">,</span> <span class="s">&quot;csv&quot;</span><span class="p">,</span> header <span class="o">=</span> <span class="s">&quot;true&quot;</span><span class="p">,</span> inferSchema <span class="o">=</span> <span class="s">&quot;true&quot;</span><span class="p">,</span> na.strings <span class="o">=</span> <span class="s">&quot;NA&quot;</span><span class="p">)</span></code></pre></div>
+  <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span>df <span class="o">&lt;-</span> read.df<span class="p">(</span>csvPath<span class="p">,</span> <span class="s">&quot;csv&quot;</span><span class="p">,</span> header <span class="o">=</span> <span class="s">&quot;true&quot;</span><span class="p">,</span> inferSchema <span class="o">=</span> <span class="s">&quot;true&quot;</span><span class="p">,</span> na.strings <span class="o">=</span> <span class="s">&quot;NA&quot;</span><span class="p">)</span></code></pre></figure>
 
 </div>
 
@@ -342,7 +342,7 @@ to a Parquet file using <code>write.df</code>.</p>
 
 <div data-lang="r">
 
-  <div class="highlight"><pre><code class="language-r" data-lang="r">write.df<span class="p">(</span>people<span class="p">,</span> path <span class="o">=</span> <span class="s">&quot;people.parquet&quot;</span><span class="p">,</span> <span class="kn">source</span> <span class="o">=</span> <span class="s">&quot;parquet&quot;</span><span class="p">,</span> mode <span class="o">=</span> <span class="s">&quot;overwrite&quot;</span><span class="p">)</span></code></pre></div>
+  <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span>write.df<span class="p">(</span>people<span class="p">,</span> path <span class="o">=</span> <span class="s">&quot;people.parquet&quot;</span><span class="p">,</span> <span class="kn">source</span> <span class="o">=</span> <span class="s">&quot;parquet&quot;</span><span class="p">,</span> mode <span class="o">=</span> <span class="s">&quot;overwrite&quot;</span><span class="p">)</span></code></pre></figure>
 
 </div>
 
@@ -352,7 +352,7 @@ to a Parquet file using <code>write.df</code>.</p>
 
 <div data-lang="r">
 
-  <div class="highlight"><pre><code class="language-r" data-lang="r">sparkR.session<span class="p">()</span>
+  <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span>sparkR.session<span class="p">()</span>
 
 sql<span class="p">(</span><span class="s">&quot;CREATE TABLE IF NOT EXISTS src (key INT, value STRING)&quot;</span><span class="p">)</span>
 sql<span class="p">(</span><span class="s">&quot;LOAD DATA LOCAL INPATH &#39;examples/src/main/resources/kv1.txt&#39; INTO TABLE src&quot;</span><span class="p">)</span>
@@ -365,7 +365,7 @@ results <span class="o">&lt;-</span> sql<span class="p">(</span><span class="s">
 <span class="c1">##  key   value</span>
 <span class="c1">## 1 238 val_238</span>
 <span class="c1">## 2  86  val_86</span>
-<span class="c1">## 3 311 val_311</span></code></pre></div>
+<span class="c1">## 3 311 val_311</span></code></pre></figure>
 
 </div>
 
@@ -378,7 +378,7 @@ Here we include some basic examples and a complete list can be found in the <a h
 
 <div data-lang="r">
 
-  <div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Create the SparkDataFrame</span>
+  <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="c1"># Create the SparkDataFrame</span>
 df <span class="o">&lt;-</span> as.DataFrame<span class="p">(</span>faithful<span class="p">)</span>
 
 <span class="c1"># Get basic information about the SparkDataFrame</span>
@@ -400,7 +400,7 @@ df
 <span class="c1">##  eruptions waiting</span>
 <span class="c1">##1     1.750      47</span>
 <span class="c1">##2     1.750      47</span>
-<span class="c1">##3     1.867      48</span></code></pre></div>
+<span class="c1">##3     1.867      48</span></code></pre></figure>
 
 </div>
 
@@ -410,7 +410,7 @@ df
 
 <div data-lang="r">
 
-  <div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># We use the `n` operator to count the number of times each waiting time appears</span>
+  <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="c1"># We use the `n` operator to count the number of times each waiting time appears</span>
 <span class="kp">head</span><span class="p">(</span>summarize<span class="p">(</span>groupBy<span class="p">(</span>df<span class="p">,</span> df<span class="o">$</span>waiting<span class="p">),</span> count <span class="o">=</span> n<span class="p">(</span>df<span class="o">$</span>waiting<span class="p">)))</span>
 <span class="c1">##  waiting count</span>
 <span class="c1">##1      70     4</span>
@@ -423,7 +423,7 @@ waiting_counts <span class="o">&lt;-</span> summarize<span class="p">(</span>gro
 <span class="c1">##   waiting count</span>
 <span class="c1">##1      78    15</span>
 <span class="c1">##2      83    14</span>
-<span class="c1">##3      81    13</span></code></pre></div>
+<span class="c1">##3      81    13</span></code></pre></figure>
 
 </div>
 
@@ -433,14 +433,14 @@ waiting_counts <span class="o">&lt;-</span> summarize<span class="p">(</span>gro
 
 <div data-lang="r">
 
-  <div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Convert waiting time from hours to seconds.</span>
+  <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="c1"># Convert waiting time from hours to seconds.</span>
 <span class="c1"># Note that we can assign this to a new column in the same SparkDataFrame</span>
 df<span class="o">$</span>waiting_secs <span class="o">&lt;-</span> df<span class="o">$</span>waiting <span class="o">*</span> <span class="m">60</span>
 <span class="kp">head</span><span class="p">(</span>df<span class="p">)</span>
 <span class="c1">##  eruptions waiting waiting_secs</span>
 <span class="c1">##1     3.600      79         4740</span>
 <span class="c1">##2     1.800      54         3240</span>
-<span class="c1">##3     3.333      74         4440</span></code></pre></div>
+<span class="c1">##3     3.333      74         4440</span></code></pre></figure>
 
 </div>
 
@@ -455,7 +455,7 @@ and should have only one parameter, to which a <code>data.frame</code> correspon
 
 <div data-lang="r">
 
-  <div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Convert waiting time from hours to seconds.</span>
+  <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="c1"># Convert waiting time from hours to seconds.</span>
 <span class="c1"># Note that we can apply UDF to DataFrame.</span>
 schema <span class="o">&lt;-</span> structType<span class="p">(</span>structField<span class="p">(</span><span class="s">&quot;eruptions&quot;</span><span class="p">,</span> <span class="s">&quot;double&quot;</span><span class="p">),</span> structField<span class="p">(</span><span class="s">&quot;waiting&quot;</span><span class="p">,</span> <span class="s">&quot;double&quot;</span><span class="p">),</span>
                      structField<span class="p">(</span><span class="s">&quot;waiting_secs&quot;</span><span class="p">,</span> <span class="s">&quot;double&quot;</span><span class="p">))</span>
@@ -467,7 +467,7 @@ df1 <span class="o">&lt;-</span> dapply<span class="p">(</span>df<span class="p"
 <span class="c1">##3     3.333      74         4440</span>
 <span class="c1">##4     2.283      62         3720</span>
 <span class="c1">##5     4.533      85         5100</span>
-<span class="c1">##6     2.883      55         3300</span></code></pre></div>
+<span class="c1">##6     2.883      55         3300</span></code></pre></figure>
 
 </div>
 
@@ -477,7 +477,7 @@ should be a <code>data.frame</code>. But, Schema is not required to be passed. N
 
 <div data-lang="r">
 
-  <div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Convert waiting time from hours to seconds.</span>
+  <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="c1"># Convert waiting time from hours to seconds.</span>
 <span class="c1"># Note that we can apply UDF to DataFrame and return a R&#39;s data.frame</span>
 ldf <span class="o">&lt;-</span> dapplyCollect<span class="p">(</span>
          df<span class="p">,</span>
@@ -488,7 +488,7 @@ ldf <span class="o">&lt;-</span> dapplyCollect<span class="p">(</span>
 <span class="c1">##  eruptions waiting waiting_secs</span>
 <span class="c1">##1     3.600      79         4740</span>
 <span class="c1">##2     1.800      54         3240</span>
-<span class="c1">##3     3.333      74         4440</span></code></pre></div>
+<span class="c1">##3     3.333      74         4440</span></code></pre></figure>
 
 </div>
 
@@ -502,7 +502,7 @@ The output of function should be a <code>data.frame</code>. Schema specifies the
 
 <div data-lang="r">
 
-  <div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Determine six waiting times with the largest eruption time in minutes.</span>
+  <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="c1"># Determine six waiting times with the largest eruption time in minutes.</span>
 schema <span class="o">&lt;-</span> structType<span class="p">(</span>structField<span class="p">(</span><span class="s">&quot;waiting&quot;</span><span class="p">,</span> <span class="s">&quot;double&quot;</span><span class="p">),</span> structField<span class="p">(</span><span class="s">&quot;max_eruption&quot;</span><span class="p">,</span> <span class="s">&quot;double&quot;</span><span class="p">))</span>
 result <span class="o">&lt;-</span> gapply<span class="p">(</span>
     df<span class="p">,</span>
@@ -519,7 +519,7 @@ result <span class="o">&lt;-</span> gapply<span class="p">(</span>
 <span class="c1">##3      71       5.033</span>
 <span class="c1">##4      87       5.000</span>
 <span class="c1">##5      63       4.933</span>
-<span class="c1">##6      89       4.900</span></code></pre></div>
+<span class="c1">##6      89       4.900</span></code></pre></figure>
 
 </div>
 
@@ -528,7 +528,7 @@ result <span class="o">&lt;-</span> gapply<span class="p">(</span>
 
 <div data-lang="r">
 
-  <div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Determine six waiting times with the largest eruption time in minutes.</span>
+  <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="c1"># Determine six waiting times with the largest eruption time in minutes.</span>
 result <span class="o">&lt;-</span> gapplyCollect<span class="p">(</span>
     df<span class="p">,</span>
     <span class="s">&quot;waiting&quot;</span><span class="p">,</span>
@@ -545,7 +545,7 @@ result <span class="o">&lt;-</span> gapplyCollect<span class="p">(</span>
 <span class="c1">##3      71       5.033</span>
 <span class="c1">##4      87       5.000</span>
 <span class="c1">##5      63       4.933</span>
-<span class="c1">##6      89       4.900</span></code></pre></div>
+<span class="c1">##6      89       4.900</span></code></pre></figure>
 
 </div>
 
@@ -628,7 +628,7 @@ should fit in a single machine. If that is not the case they can do something li
 
 <div data-lang="r">
 
-  <div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Perform distributed training of multiple models with spark.lapply. Here, we pass</span>
+  <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="c1"># Perform distributed training of multiple models with spark.lapply. Here, we pass</span>
 <span class="c1"># a read-only list of arguments which specifies family the generalized linear model should be.</span>
 families <span class="o">&lt;-</span> <span class="kt">c</span><span class="p">(</span><span class="s">&quot;gaussian&quot;</span><span class="p">,</span> <span class="s">&quot;poisson&quot;</span><span class="p">)</span>
 train <span class="o">&lt;-</span> <span class="kr">function</span><span class="p">(</span>family<span class="p">)</span> <span class="p">{</span>
@@ -639,7 +639,7 @@ train <span class="o">&lt;-</span> <span class="kr">function</span><span class="
 model.summaries <span class="o">&lt;-</span> spark.lapply<span class="p">(</span>families<span class="p">,</span> train<span class="p">)</span>
 
 <span class="c1"># Print the summary of each model</span>
-<span class="kp">print</span><span class="p">(</span>model.summaries<span class="p">)</span></code></pre></div>
+<span class="kp">print</span><span class="p">(</span>model.summaries<span class="p">)</span></code></pre></figure>
 
 </div>
 
@@ -649,7 +649,7 @@ The <code>sql</code> function enables applications to run SQL queries programmat
 
 <div data-lang="r">
 
-  <div class="highlight"><pre><code class="language-r" data-lang="r"><span class="c1"># Load a JSON file</span>
+  <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span><span class="c1"># Load a JSON file</span>
 people <span class="o">&lt;-</span> read.df<span class="p">(</span><span class="s">&quot;./examples/src/main/resources/people.json&quot;</span><span class="p">,</span> <span class="s">&quot;json&quot;</span><span class="p">)</span>
 
 <span class="c1"># Register this SparkDataFrame as a temporary view.</span>
@@ -659,7 +659,7 @@ createOrReplaceTempView<span class="p">(</span>people<span class="p">,</span> <s
 teenagers <span class="o">&lt;-</span> sql<span class="p">(</span><span class="s">&quot;SELECT name FROM people WHERE age &gt;= 13 AND age &lt;= 19&quot;</span><span class="p">)</span>
 <span class="kp">head</span><span class="p">(</span>teenagers<span class="p">)</span>
 <span class="c1">##    name</span>
-<span class="c1">##1 Justin</span></code></pre></div>
+<span class="c1">##1 Justin</span></code></pre></figure>
 
 </div>
 
@@ -691,28 +691,27 @@ SparkR supports a subset of the available R formula operators for model fitting,
 
 <h2 id="model-persistence">Model persistence</h2>
 
-<p>The following example shows how to save/load a MLlib model by SparkR.</p>
-<div class="highlight"><pre>irisDF <span class="o">&lt;-</span> <span class="kp">suppressWarnings</span><span class="p">(</span>createDataFrame<span class="p">(</span>iris<span class="p">))</span>
+<p>The following example shows how to save/load a MLlib model by SparkR.
+&lt;div class="highlight"&gt;&lt;pre&gt;<span></span>irisDF <span class="o">&lt;-</span> <span class="kp">suppressWarnings</span><span class="p">(</span>createDataFrame<span class="p">(</span>iris<span class="p">))</span>
 <span class="c1"># Fit a generalized linear model of family &quot;gaussian&quot; with spark.glm</span>
 gaussianDF <span class="o">&lt;-</span> irisDF
 gaussianTestDF <span class="o">&lt;-</span> irisDF
-gaussianGLM <span class="o">&lt;-</span> spark.glm<span class="p">(</span>gaussianDF<span class="p">,</span> Sepal_Length <span class="o">~</span> Sepal_Width <span class="o">+</span> Species<span class="p">,</span> family <span class="o">=</span> <span class="s">&quot;gaussian&quot;</span><span class="p">)</span>
+gaussianGLM <span class="o">&lt;-</span> spark.glm<span class="p">(</span>gaussianDF<span class="p">,</span> Sepal_Length <span class="o">~</span> Sepal_Width <span class="o">+</span> Species<span class="p">,</span> family <span class="o">=</span> <span class="s">&quot;gaussian&quot;</span><span class="p">)</span></p>
 
-<span class="c1"># Save and then load a fitted MLlib model</span>
+<p><span class="c1"># Save and then load a fitted MLlib model</span>
 modelPath <span class="o">&lt;-</span> <span class="kp">tempfile</span><span class="p">(</span>pattern <span class="o">=</span> <span class="s">&quot;ml&quot;</span><span class="p">,</span> fileext <span class="o">=</span> <span class="s">&quot;.tmp&quot;</span><span class="p">)</span>
 write.ml<span class="p">(</span>gaussianGLM<span class="p">,</span> modelPath<span class="p">)</span>
-gaussianGLM2 <span class="o">&lt;-</span> read.ml<span class="p">(</span>modelPath<span class="p">)</span>
+gaussianGLM2 <span class="o">&lt;-</span> read.ml<span class="p">(</span>modelPath<span class="p">)</span></p>
 
-<span class="c1"># Check model summary</span>
-<span class="kp">summary</span><span class="p">(</span>gaussianGLM2<span class="p">)</span>
+<p><span class="c1"># Check model summary</span>
+<span class="kp">summary</span><span class="p">(</span>gaussianGLM2<span class="p">)</span></p>
 
-<span class="c1"># Check model prediction</span>
+<p><span class="c1"># Check model prediction</span>
 gaussianPredictions <span class="o">&lt;-</span> predict<span class="p">(</span>gaussianGLM2<span class="p">,</span> gaussianTestDF<span class="p">)</span>
-showDF<span class="p">(</span>gaussianPredictions<span class="p">)</span>
+showDF<span class="p">(</span>gaussianPredictions<span class="p">)</span></p>
 
-<span class="kp">unlink</span><span class="p">(</span>modelPath<span class="p">)</span>
-</pre></div>
-<div><small>Find full example code at "examples/src/main/r/ml/ml.R" in the Spark repo.</small></div>
+<p><span class="kp">unlink</span><span class="p">(</span>modelPath<span class="p">)</span>
+&lt;/pre&gt;&lt;/div&gt;&lt;div&gt;<small>Find full example code at &#8220;examples/src/main/r/ml/ml.R&#8221; in the Spark repo.</small>&lt;/div&gt;</p>
 
 <h1 id="r-function-name-conflicts">R Function Name Conflicts</h1>
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[10/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-pmml-model-export.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-pmml-model-export.html b/site/docs/2.1.0/mllib-pmml-model-export.html
index 30815e0..3f2fd91 100644
--- a/site/docs/2.1.0/mllib-pmml-model-export.html
+++ b/site/docs/2.1.0/mllib-pmml-model-export.html
@@ -307,8 +307,8 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#sparkmllib-supported-models" id="markdown-toc-sparkmllib-supported-models"><code>spark.mllib</code> supported models</a></li>
-  <li><a href="#examples" id="markdown-toc-examples">Examples</a></li>
+  <li><a href="#sparkmllib-supported-models"><code>spark.mllib</code> supported models</a></li>
+  <li><a href="#examples">Examples</a></li>
 </ul>
 
 <h2 id="sparkmllib-supported-models"><code>spark.mllib</code> supported models</h2>
@@ -353,32 +353,31 @@
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.clustering.KMeans"><code>KMeans</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.linalg.Vectors$"><code>Vectors</code> Scala docs</a> for details on the API.</p>
 
-    <p>Here a complete example of building a KMeansModel and print it out in PMML format:</p>
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.KMeans</span>
-<span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
+    <p>Here a complete example of building a KMeansModel and print it out in PMML format:
+&lt;div class="highlight"&gt;&lt;pre&gt;<span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.KMeans</span>
+<span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span></p>
 
-<span class="c1">// Load and parse the data</span>
+    <p><span class="c1">// Load and parse the data</span>
 <span class="k">val</span> <span class="n">data</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="o">)</span>
-<span class="k">val</span> <span class="n">parsedData</span> <span class="k">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">s</span> <span class="k">=&gt;</span> <span class="nc">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="o">(</span><span class="n">s</span><span class="o">.</span><span class="n">split</span><span class="o">(</span><span class="sc">&#39; &#39;</span><span class="o">).</span><span class="n">map</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">toDouble</span><span class="o">))).</span><span class="n">cache</span><span class="o">()</span>
+<span class="k">val</span> <span class="n">parsedData</span> <span class="k">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">s</span> <span class="k">=&gt;</span> <span class="nc">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="o">(</span><span class="n">s</span><span class="o">.</span><span class="n">split</span><span class="o">(</span><span class="sc">&#39; &#39;</span><span class="o">).</span><span class="n">map</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">toDouble</span><span class="o">))).</span><span class="n">cache</span><span class="o">()</span></p>
 
-<span class="c1">// Cluster the data into two classes using KMeans</span>
+    <p><span class="c1">// Cluster the data into two classes using KMeans</span>
 <span class="k">val</span> <span class="n">numClusters</span> <span class="k">=</span> <span class="mi">2</span>
 <span class="k">val</span> <span class="n">numIterations</span> <span class="k">=</span> <span class="mi">20</span>
-<span class="k">val</span> <span class="n">clusters</span> <span class="k">=</span> <span class="nc">KMeans</span><span class="o">.</span><span class="n">train</span><span class="o">(</span><span class="n">parsedData</span><span class="o">,</span> <span class="n">numClusters</span><span class="o">,</span> <span class="n">numIterations</span><span class="o">)</span>
+<span class="k">val</span> <span class="n">clusters</span> <span class="k">=</span> <span class="nc">KMeans</span><span class="o">.</span><span class="n">train</span><span class="o">(</span><span class="n">parsedData</span><span class="o">,</span> <span class="n">numClusters</span><span class="o">,</span> <span class="n">numIterations</span><span class="o">)</span></p>
 
-<span class="c1">// Export to PMML to a String in PMML format</span>
-<span class="n">println</span><span class="o">(</span><span class="s">&quot;PMML Model:\n&quot;</span> <span class="o">+</span> <span class="n">clusters</span><span class="o">.</span><span class="n">toPMML</span><span class="o">)</span>
+    <p><span class="c1">// Export to PMML to a String in PMML format</span>
+<span class="n">println</span><span class="o">(</span><span class="s">&quot;PMML Model:\n&quot;</span> <span class="o">+</span> <span class="n">clusters</span><span class="o">.</span><span class="n">toPMML</span><span class="o">)</span></p>
 
-<span class="c1">// Export the model to a local file in PMML format</span>
-<span class="n">clusters</span><span class="o">.</span><span class="n">toPMML</span><span class="o">(</span><span class="s">&quot;/tmp/kmeans.xml&quot;</span><span class="o">)</span>
+    <p><span class="c1">// Export the model to a local file in PMML format</span>
+<span class="n">clusters</span><span class="o">.</span><span class="n">toPMML</span><span class="o">(</span><span class="s">&quot;/tmp/kmeans.xml&quot;</span><span class="o">)</span></p>
 
-<span class="c1">// Export the model to a directory on a distributed file system in PMML format</span>
-<span class="n">clusters</span><span class="o">.</span><span class="n">toPMML</span><span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="s">&quot;/tmp/kmeans&quot;</span><span class="o">)</span>
+    <p><span class="c1">// Export the model to a directory on a distributed file system in PMML format</span>
+<span class="n">clusters</span><span class="o">.</span><span class="n">toPMML</span><span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="s">&quot;/tmp/kmeans&quot;</span><span class="o">)</span></p>
 
-<span class="c1">// Export the model to the OutputStream in PMML format</span>
+    <p><span class="c1">// Export the model to the OutputStream in PMML format</span>
 <span class="n">clusters</span><span class="o">.</span><span class="n">toPMML</span><span class="o">(</span><span class="nc">System</span><span class="o">.</span><span class="n">out</span><span class="o">)</span>
-</pre></div>
-    <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala" in the Spark repo.</small></div>
+&lt;/pre&gt;&lt;/div&gt;&lt;div&gt;<small>Find full example code at &#8220;examples/src/main/scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala&#8221; in the Spark repo.</small>&lt;/div&gt;</p>
 
     <p>For unsupported models, either you will not find a <code>.toPMML</code> method or an <code>IllegalArgumentException</code> will be thrown.</p>
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-statistics.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-statistics.html b/site/docs/2.1.0/mllib-statistics.html
index 4485ecf..f04924c 100644
--- a/site/docs/2.1.0/mllib-statistics.html
+++ b/site/docs/2.1.0/mllib-statistics.html
@@ -358,15 +358,15 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#summary-statistics" id="markdown-toc-summary-statistics">Summary statistics</a></li>
-  <li><a href="#correlations" id="markdown-toc-correlations">Correlations</a></li>
-  <li><a href="#stratified-sampling" id="markdown-toc-stratified-sampling">Stratified sampling</a></li>
-  <li><a href="#hypothesis-testing" id="markdown-toc-hypothesis-testing">Hypothesis testing</a>    <ul>
-      <li><a href="#streaming-significance-testing" id="markdown-toc-streaming-significance-testing">Streaming Significance Testing</a></li>
+  <li><a href="#summary-statistics">Summary statistics</a></li>
+  <li><a href="#correlations">Correlations</a></li>
+  <li><a href="#stratified-sampling">Stratified sampling</a></li>
+  <li><a href="#hypothesis-testing">Hypothesis testing</a>    <ul>
+      <li><a href="#streaming-significance-testing">Streaming Significance Testing</a></li>
     </ul>
   </li>
-  <li><a href="#random-data-generation" id="markdown-toc-random-data-generation">Random data generation</a></li>
-  <li><a href="#kernel-density-estimation" id="markdown-toc-kernel-density-estimation">Kernel density estimation</a></li>
+  <li><a href="#random-data-generation">Random data generation</a></li>
+  <li><a href="#kernel-density-estimation">Kernel density estimation</a></li>
 </ul>
 
 <p><code>\[
@@ -401,7 +401,7 @@ total count.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.stat.MultivariateStatisticalSummary"><code>MultivariateStatisticalSummary</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.stat.</span><span class="o">{</span><span class="nc">MultivariateStatisticalSummary</span><span class="o">,</span> <span class="nc">Statistics</span><span class="o">}</span>
 
 <span class="k">val</span> <span class="n">observations</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span>
@@ -430,7 +430,7 @@ total count.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/stat/MultivariateStatisticalSummary.html"><code>MultivariateStatisticalSummary</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vector</span><span class="o">;</span>
@@ -463,19 +463,19 @@ total count.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.stat.MultivariateStatisticalSummary"><code>MultivariateStatisticalSummary</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
 
 <span class="kn">from</span> <span class="nn">pyspark.mllib.stat</span> <span class="kn">import</span> <span class="n">Statistics</span>
 
 <span class="n">mat</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">(</span>
     <span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">10.0</span><span class="p">,</span> <span class="mf">100.0</span><span class="p">]),</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">2.0</span><span class="p">,</span> <span class="mf">20.0</span><span class="p">,</span> <span class="mf">200.0</span><span class="p">]),</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">3.0</span><span class="p">,</span> <span class="mf">30.0</span><span class="p">,</span> <span class="mf">300.0</span><span class="p">])]</span>
-<span class="p">)</span>  <span class="c"># an RDD of Vectors</span>
+<span class="p">)</span>  <span class="c1"># an RDD of Vectors</span>
 
-<span class="c"># Compute column summary statistics.</span>
+<span class="c1"># Compute column summary statistics.</span>
 <span class="n">summary</span> <span class="o">=</span> <span class="n">Statistics</span><span class="o">.</span><span class="n">colStats</span><span class="p">(</span><span class="n">mat</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="n">summary</span><span class="o">.</span><span class="n">mean</span><span class="p">())</span>  <span class="c"># a dense vector containing the mean value for each column</span>
-<span class="k">print</span><span class="p">(</span><span class="n">summary</span><span class="o">.</span><span class="n">variance</span><span class="p">())</span>  <span class="c"># column-wise variance</span>
-<span class="k">print</span><span class="p">(</span><span class="n">summary</span><span class="o">.</span><span class="n">numNonzeros</span><span class="p">())</span>  <span class="c"># number of nonzeros in each column</span>
+<span class="k">print</span><span class="p">(</span><span class="n">summary</span><span class="o">.</span><span class="n">mean</span><span class="p">())</span>  <span class="c1"># a dense vector containing the mean value for each column</span>
+<span class="k">print</span><span class="p">(</span><span class="n">summary</span><span class="o">.</span><span class="n">variance</span><span class="p">())</span>  <span class="c1"># column-wise variance</span>
+<span class="k">print</span><span class="p">(</span><span class="n">summary</span><span class="o">.</span><span class="n">numNonzeros</span><span class="p">())</span>  <span class="c1"># number of nonzeros in each column</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/summary_statistics_example.py" in the Spark repo.</small></div>
   </div>
@@ -496,7 +496,7 @@ an <code>RDD[Vector]</code>, the output will be a <code>Double</code> or the cor
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.stat.Statistics$"><code>Statistics</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg._</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg._</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.stat.Statistics</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.rdd.RDD</span>
 
@@ -507,7 +507,7 @@ an <code>RDD[Vector]</code>, the output will be a <code>Double</code> or the cor
 <span class="c1">// compute the correlation using Pearson&#39;s method. Enter &quot;spearman&quot; for Spearman&#39;s method. If a</span>
 <span class="c1">// method is not specified, Pearson&#39;s method will be used by default.</span>
 <span class="k">val</span> <span class="n">correlation</span><span class="k">:</span> <span class="kt">Double</span> <span class="o">=</span> <span class="nc">Statistics</span><span class="o">.</span><span class="n">corr</span><span class="o">(</span><span class="n">seriesX</span><span class="o">,</span> <span class="n">seriesY</span><span class="o">,</span> <span class="s">&quot;pearson&quot;</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Correlation is: $correlation&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Correlation is: </span><span class="si">$correlation</span><span class="s">&quot;</span><span class="o">)</span>
 
 <span class="k">val</span> <span class="n">data</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">Vector</span><span class="o">]</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span>
   <span class="nc">Seq</span><span class="o">(</span>
@@ -531,7 +531,7 @@ a <code>JavaRDD&lt;Vector&gt;</code>, the output will be a <code>Double</code> o
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/stat/Statistics.html"><code>Statistics</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaDoubleRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
@@ -577,23 +577,23 @@ an <code>RDD[Vector]</code>, the output will be a <code>Double</code> or the cor
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.stat.Statistics"><code>Statistics</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.stat</span> <span class="kn">import</span> <span class="n">Statistics</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.stat</span> <span class="kn">import</span> <span class="n">Statistics</span>
 
-<span class="n">seriesX</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">])</span>  <span class="c"># a series</span>
-<span class="c"># seriesY must have the same number of partitions and cardinality as seriesX</span>
+<span class="n">seriesX</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">])</span>  <span class="c1"># a series</span>
+<span class="c1"># seriesY must have the same number of partitions and cardinality as seriesX</span>
 <span class="n">seriesY</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([</span><span class="mf">11.0</span><span class="p">,</span> <span class="mf">22.0</span><span class="p">,</span> <span class="mf">33.0</span><span class="p">,</span> <span class="mf">33.0</span><span class="p">,</span> <span class="mf">555.0</span><span class="p">])</span>
 
-<span class="c"># Compute the correlation using Pearson&#39;s method. Enter &quot;spearman&quot; for Spearman&#39;s method.</span>
-<span class="c"># If a method is not specified, Pearson&#39;s method will be used by default.</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Correlation is: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">Statistics</span><span class="o">.</span><span class="n">corr</span><span class="p">(</span><span class="n">seriesX</span><span class="p">,</span> <span class="n">seriesY</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s">&quot;pearson&quot;</span><span class="p">)))</span>
+<span class="c1"># Compute the correlation using Pearson&#39;s method. Enter &quot;spearman&quot; for Spearman&#39;s method.</span>
+<span class="c1"># If a method is not specified, Pearson&#39;s method will be used by default.</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Correlation is: &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">Statistics</span><span class="o">.</span><span class="n">corr</span><span class="p">(</span><span class="n">seriesX</span><span class="p">,</span> <span class="n">seriesY</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s2">&quot;pearson&quot;</span><span class="p">)))</span>
 
 <span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">(</span>
     <span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">10.0</span><span class="p">,</span> <span class="mf">100.0</span><span class="p">]),</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">2.0</span><span class="p">,</span> <span class="mf">20.0</span><span class="p">,</span> <span class="mf">200.0</span><span class="p">]),</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mf">5.0</span><span class="p">,</span> <span class="mf">33.0</span><span class="p">,</span> <span class="mf">366.0</span><span class="p">])]</span>
-<span class="p">)</span>  <span class="c"># an RDD of Vectors</span>
+<span class="p">)</span>  <span class="c1"># an RDD of Vectors</span>
 
-<span class="c"># calculate the correlation matrix using Pearson&#39;s method. Use &quot;spearman&quot; for Spearman&#39;s method.</span>
-<span class="c"># If a method is not specified, Pearson&#39;s method will be used by default.</span>
-<span class="k">print</span><span class="p">(</span><span class="n">Statistics</span><span class="o">.</span><span class="n">corr</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s">&quot;pearson&quot;</span><span class="p">))</span>
+<span class="c1"># calculate the correlation matrix using Pearson&#39;s method. Use &quot;spearman&quot; for Spearman&#39;s method.</span>
+<span class="c1"># If a method is not specified, Pearson&#39;s method will be used by default.</span>
+<span class="k">print</span><span class="p">(</span><span class="n">Statistics</span><span class="o">.</span><span class="n">corr</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">method</span><span class="o">=</span><span class="s2">&quot;pearson&quot;</span><span class="p">))</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/correlations_example.py" in the Spark repo.</small></div>
   </div>
@@ -621,9 +621,9 @@ fraction for key $k$, $n_k$ is the number of key-value pairs for key $k$, and $K
 keys. Sampling without replacement requires one additional pass over the RDD to guarantee sample
 size, whereas sampling with replacement requires two additional passes.</p>
 
-    <div class="highlight"><pre><span class="c1">// an RDD[(K, V)] of any key value pairs</span>
+    <div class="highlight"><pre><span></span><span class="c1">// an RDD[(K, V)] of any key value pairs</span>
 <span class="k">val</span> <span class="n">data</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span>
-  <span class="nc">Seq</span><span class="o">((</span><span class="mi">1</span><span class="o">,</span> <span class="-Symbol">&#39;a</span><span class="err">&#39;</span><span class="o">),</span> <span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="-Symbol">&#39;b</span><span class="err">&#39;</span><span class="o">),</span> <span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="-Symbol">&#39;c</span><span class="err">&#39;</span><span class="o">),</span> <span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="-Symbol">&#39;d</span><span class="err">&#39;</span><span class="o">),</span> <span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="-Symbol">&#39;e</span><span class="err">&#39;</span><span class="o">),</span> <span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="-Symbol">&#39;f</span><span class="err">&#39;</span><sp
 an class="o">)))</span>
+  <span class="nc">Seq</span><span class="o">((</span><span class="mi">1</span><span class="o">,</span> <span class="sc">&#39;a&#39;</span><span class="o">),</span> <span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="sc">&#39;b&#39;</span><span class="o">),</span> <span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="sc">&#39;c&#39;</span><span class="o">),</span> <span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="sc">&#39;d&#39;</span><span class="o">),</span> <span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="sc">&#39;e&#39;</span><span class="o">),</span> <span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="sc">&#39;f&#39;</span><span class="o">)))</span>
 
 <span class="c1">// specify the exact fraction desired from each key</span>
 <span class="k">val</span> <span class="n">fractions</span> <span class="k">=</span> <span class="nc">Map</span><span class="o">(</span><span class="mi">1</span> <span class="o">-&gt;</span> <span class="mf">0.1</span><span class="o">,</span> <span class="mi">2</span> <span class="o">-&gt;</span> <span class="mf">0.6</span><span class="o">,</span> <span class="mi">3</span> <span class="o">-&gt;</span> <span class="mf">0.3</span><span class="o">)</span>
@@ -643,7 +643,7 @@ fraction for key $k$, $n_k$ is the number of key-value pairs for key $k$, and $K
 keys. Sampling without replacement requires one additional pass over the RDD to guarantee sample
 size, whereas sampling with replacement requires two additional passes.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.*</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.*</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
 
@@ -678,10 +678,10 @@ set of keys.</p>
 
     <p><em>Note:</em> <code>sampleByKeyExact()</code> is currently not supported in Python.</p>
 
-    <div class="highlight"><pre><span class="c"># an RDD of any key value pairs</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([(</span><span class="mi">1</span><span class="p">,</span> <span class="s">&#39;a&#39;</span><span class="p">),</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s">&#39;b&#39;</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s">&#39;c&#39;</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s">&#39;d&#39;</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s">&#39;e&#39;</span><span class="p">),</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s">&#39;f&#39;</span><span class="p">)])</span>
+    <div class="highlight"><pre><span></span><span class="c1"># an RDD of any key value pairs</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">&#39;a&#39;</span><span class="p">),</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">&#39;b&#39;</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">&#39;c&#39;</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">&#39;d&#39;</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">&#39;e&#39;</span><span class="p">),</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">&#39;f&#39;</span><span class="p">)])</span>
 
-<span class="c"># specify the exact fraction desired from each key as a dictionary</span>
+<span class="c1"># specify the exact fraction desired from each key as a dictionary</span>
 <span class="n">fractions</span> <span class="o">=</span> <span class="p">{</span><span class="mi">1</span><span class="p">:</span> <span class="mf">0.1</span><span class="p">,</span> <span class="mi">2</span><span class="p">:</span> <span class="mf">0.6</span><span class="p">,</span> <span class="mi">3</span><span class="p">:</span> <span class="mf">0.3</span><span class="p">}</span>
 
 <span class="n">approxSample</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">sampleByKey</span><span class="p">(</span><span class="bp">False</span><span class="p">,</span> <span class="n">fractions</span><span class="p">)</span>
@@ -708,7 +708,7 @@ independence tests.</p>
 run Pearson&#8217;s chi-squared tests. The following example demonstrates how to run and interpret
 hypothesis tests.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg._</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg._</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.stat.Statistics</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.stat.test.ChiSqTestResult</span>
@@ -722,7 +722,7 @@ hypothesis tests.</p>
 <span class="k">val</span> <span class="n">goodnessOfFitTestResult</span> <span class="k">=</span> <span class="nc">Statistics</span><span class="o">.</span><span class="n">chiSqTest</span><span class="o">(</span><span class="n">vec</span><span class="o">)</span>
 <span class="c1">// summary of the test including the p-value, degrees of freedom, test statistic, the method</span>
 <span class="c1">// used, and the null hypothesis.</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;$goodnessOfFitTestResult\n&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;</span><span class="si">$goodnessOfFitTestResult</span><span class="s">\n&quot;</span><span class="o">)</span>
 
 <span class="c1">// a contingency matrix. Create a dense matrix ((1.0, 2.0), (3.0, 4.0), (5.0, 6.0))</span>
 <span class="k">val</span> <span class="n">mat</span><span class="k">:</span> <span class="kt">Matrix</span> <span class="o">=</span> <span class="nc">Matrices</span><span class="o">.</span><span class="n">dense</span><span class="o">(</span><span class="mi">3</span><span class="o">,</span> <span class="mi">2</span><span class="o">,</span> <span class="nc">Array</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">,</span> <span class="mf">5.0</span><span class="o">,</span> <span class="mf">2.0</span><span class="o">,</span> <span class="mf">4.0</span><span class="o">,</span> <span class="mf">6.0</span><span class="o">))</span>
@@ -730,7 +730,7 @@ hypothesis tests.</p>
 <span class="c1">// conduct Pearson&#39;s independence test on the input contingency matrix</span>
 <span class="k">val</span> <span class="n">independenceTestResult</span> <span class="k">=</span> <span class="nc">Statistics</span><span class="o">.</span><span class="n">chiSqTest</span><span class="o">(</span><span class="n">mat</span><span class="o">)</span>
 <span class="c1">// summary of the test including the p-value, degrees of freedom</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;$independenceTestResult\n&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;</span><span class="si">$independenceTestResult</span><span class="s">\n&quot;</span><span class="o">)</span>
 
 <span class="k">val</span> <span class="n">obs</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">LabeledPoint</span><span class="o">]</span> <span class="k">=</span>
   <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span>
@@ -761,7 +761,7 @@ hypothesis tests.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/stat/test/ChiSqTestResult.html"><code>ChiSqTestResult</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.linalg.Matrices</span><span class="o">;</span>
@@ -793,9 +793,9 @@ hypothesis tests.</p>
 <span class="c1">// an RDD of labeled points</span>
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">LabeledPoint</span><span class="o">&gt;</span> <span class="n">obs</span> <span class="o">=</span> <span class="n">jsc</span><span class="o">.</span><span class="na">parallelize</span><span class="o">(</span>
   <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span>
-    <span class="k">new</span> <span class="nf">LabeledPoint</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">)),</span>
-    <span class="k">new</span> <span class="nf">LabeledPoint</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">2.0</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">)),</span>
-    <span class="k">new</span> <span class="nf">LabeledPoint</span><span class="o">(-</span><span class="mf">1.0</span><span class="o">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(-</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">,</span> <span class="o">-</span><span class="mf">0.5</span><span class="o">))</span>
+    <span class="k">new</span> <span class="n">LabeledPoint</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">)),</span>
+    <span class="k">new</span> <span class="n">LabeledPoint</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">2.0</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">)),</span>
+    <span class="k">new</span> <span class="n">LabeledPoint</span><span class="o">(-</span><span class="mf">1.0</span><span class="o">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(-</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">,</span> <span class="o">-</span><span class="mf">0.5</span><span class="o">))</span>
   <span class="o">)</span>
 <span class="o">);</span>
 
@@ -820,42 +820,42 @@ hypothesis tests.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.stat.Statistics"><code>Statistics</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Matrices</span><span class="p">,</span> <span class="n">Vectors</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Matrices</span><span class="p">,</span> <span class="n">Vectors</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.regression</span> <span class="kn">import</span> <span class="n">LabeledPoint</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.stat</span> <span class="kn">import</span> <span class="n">Statistics</span>
 
-<span class="n">vec</span> <span class="o">=</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">(</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.15</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.25</span><span class="p">)</span>  <span class="c"># a vector composed of the frequencies of events</span>
+<span class="n">vec</span> <span class="o">=</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">(</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.15</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.25</span><span class="p">)</span>  <span class="c1"># a vector composed of the frequencies of events</span>
 
-<span class="c"># compute the goodness of fit. If a second vector to test against</span>
-<span class="c"># is not supplied as a parameter, the test runs against a uniform distribution.</span>
+<span class="c1"># compute the goodness of fit. If a second vector to test against</span>
+<span class="c1"># is not supplied as a parameter, the test runs against a uniform distribution.</span>
 <span class="n">goodnessOfFitTestResult</span> <span class="o">=</span> <span class="n">Statistics</span><span class="o">.</span><span class="n">chiSqTest</span><span class="p">(</span><span class="n">vec</span><span class="p">)</span>
 
-<span class="c"># summary of the test including the p-value, degrees of freedom,</span>
-<span class="c"># test statistic, the method used, and the null hypothesis.</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;</span><span class="si">%s</span><span class="se">\n</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">goodnessOfFitTestResult</span><span class="p">)</span>
+<span class="c1"># summary of the test including the p-value, degrees of freedom,</span>
+<span class="c1"># test statistic, the method used, and the null hypothesis.</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;</span><span class="si">%s</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">goodnessOfFitTestResult</span><span class="p">)</span>
 
-<span class="n">mat</span> <span class="o">=</span> <span class="n">Matrices</span><span class="o">.</span><span class="n">dense</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">4.0</span><span class="p">,</span> <span class="mf">6.0</span><span class="p">])</span>  <span class="c"># a contingency matrix</span>
+<span class="n">mat</span> <span class="o">=</span> <span class="n">Matrices</span><span class="o">.</span><span class="n">dense</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">4.0</span><span class="p">,</span> <span class="mf">6.0</span><span class="p">])</span>  <span class="c1"># a contingency matrix</span>
 
-<span class="c"># conduct Pearson&#39;s independence test on the input contingency matrix</span>
+<span class="c1"># conduct Pearson&#39;s independence test on the input contingency matrix</span>
 <span class="n">independenceTestResult</span> <span class="o">=</span> <span class="n">Statistics</span><span class="o">.</span><span class="n">chiSqTest</span><span class="p">(</span><span class="n">mat</span><span class="p">)</span>
 
-<span class="c"># summary of the test including the p-value, degrees of freedom,</span>
-<span class="c"># test statistic, the method used, and the null hypothesis.</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;</span><span class="si">%s</span><span class="se">\n</span><span class="s">&quot;</span> <span class="o">%</span> <span class="n">independenceTestResult</span><span class="p">)</span>
+<span class="c1"># summary of the test including the p-value, degrees of freedom,</span>
+<span class="c1"># test statistic, the method used, and the null hypothesis.</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;</span><span class="si">%s</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="n">independenceTestResult</span><span class="p">)</span>
 
 <span class="n">obs</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">(</span>
     <span class="p">[</span><span class="n">LabeledPoint</span><span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">]),</span>
      <span class="n">LabeledPoint</span><span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">]),</span>
      <span class="n">LabeledPoint</span><span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="p">[</span><span class="o">-</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.5</span><span class="p">])]</span>
-<span class="p">)</span>  <span class="c"># LabeledPoint(feature, label)</span>
+<span class="p">)</span>  <span class="c1"># LabeledPoint(feature, label)</span>
 
-<span class="c"># The contingency table is constructed from an RDD of LabeledPoint and used to conduct</span>
-<span class="c"># the independence test. Returns an array containing the ChiSquaredTestResult for every feature</span>
-<span class="c"># against the label.</span>
+<span class="c1"># The contingency table is constructed from an RDD of LabeledPoint and used to conduct</span>
+<span class="c1"># the independence test. Returns an array containing the ChiSquaredTestResult for every feature</span>
+<span class="c1"># against the label.</span>
 <span class="n">featureTestResults</span> <span class="o">=</span> <span class="n">Statistics</span><span class="o">.</span><span class="n">chiSqTest</span><span class="p">(</span><span class="n">obs</span><span class="p">)</span>
 
 <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">result</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">featureTestResults</span><span class="p">):</span>
-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Column </span><span class="si">%d</span><span class="s">:</span><span class="se">\n</span><span class="si">%s</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">result</span><span class="p">))</span>
+    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;Column </span><span class="si">%d</span><span class="s2">:</span><span class="se">\n</span><span class="si">%s</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">result</span><span class="p">))</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/hypothesis_testing_example.py" in the Spark repo.</small></div>
   </div>
@@ -879,7 +879,7 @@ and interpret the hypothesis tests.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.stat.Statistics$"><code>Statistics</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.stat.Statistics</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.stat.Statistics</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.rdd.RDD</span>
 
 <span class="k">val</span> <span class="n">data</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">Double</span><span class="o">]</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span><span class="nc">Seq</span><span class="o">(</span><span class="mf">0.1</span><span class="o">,</span> <span class="mf">0.15</span><span class="o">,</span> <span class="mf">0.2</span><span class="o">,</span> <span class="mf">0.3</span><span class="o">,</span> <span class="mf">0.25</span><span class="o">))</span>  <span class="c1">// an RDD of sample data</span>
@@ -906,7 +906,7 @@ and interpret the hypothesis tests.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/stat/Statistics.html"><code>Statistics</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaDoubleRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.stat.Statistics</span><span class="o">;</span>
@@ -929,16 +929,16 @@ and interpret the hypothesis tests.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.stat.Statistics"><code>Statistics</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.stat</span> <span class="kn">import</span> <span class="n">Statistics</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.stat</span> <span class="kn">import</span> <span class="n">Statistics</span>
 
 <span class="n">parallelData</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.15</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="mf">0.25</span><span class="p">])</span>
 
-<span class="c"># run a KS test for the sample versus a standard normal distribution</span>
-<span class="n">testResult</span> <span class="o">=</span> <span class="n">Statistics</span><span class="o">.</span><span class="n">kolmogorovSmirnovTest</span><span class="p">(</span><span class="n">parallelData</span><span class="p">,</span> <span class="s">&quot;norm&quot;</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
-<span class="c"># summary of the test including the p-value, test statistic, and null hypothesis</span>
-<span class="c"># if our p-value indicates significance, we can reject the null hypothesis</span>
-<span class="c"># Note that the Scala functionality of calling Statistics.kolmogorovSmirnovTest with</span>
-<span class="c"># a lambda to calculate the CDF is not made available in the Python API</span>
+<span class="c1"># run a KS test for the sample versus a standard normal distribution</span>
+<span class="n">testResult</span> <span class="o">=</span> <span class="n">Statistics</span><span class="o">.</span><span class="n">kolmogorovSmirnovTest</span><span class="p">(</span><span class="n">parallelData</span><span class="p">,</span> <span class="s2">&quot;norm&quot;</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
+<span class="c1"># summary of the test including the p-value, test statistic, and null hypothesis</span>
+<span class="c1"># if our p-value indicates significance, we can reject the null hypothesis</span>
+<span class="c1"># Note that the Scala functionality of calling Statistics.kolmogorovSmirnovTest with</span>
+<span class="c1"># a lambda to calculate the CDF is not made available in the Python API</span>
 <span class="k">print</span><span class="p">(</span><span class="n">testResult</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/hypothesis_testing_kolmogorov_smirnov_test_example.py" in the Spark repo.</small></div>
@@ -967,7 +967,7 @@ all prior batches.</li>
     <p><a href="api/scala/index.html#org.apache.spark.mllib.stat.test.StreamingTest"><code>StreamingTest</code></a>
 provides streaming hypothesis testing.</p>
 
-    <div class="highlight"><pre><span class="k">val</span> <span class="n">data</span> <span class="k">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">textFileStream</span><span class="o">(</span><span class="n">dataDir</span><span class="o">).</span><span class="n">map</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="o">(</span><span class="s">&quot;,&quot;</span><span class="o">)</span> <span class="k">match</span> <span class="o">{</span>
+    <div class="highlight"><pre><span></span><span class="k">val</span> <span class="n">data</span> <span class="k">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">textFileStream</span><span class="o">(</span><span class="n">dataDir</span><span class="o">).</span><span class="n">map</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="o">(</span><span class="s">&quot;,&quot;</span><span class="o">)</span> <span class="k">match</span> <span class="o">{</span>
   <span class="k">case</span> <span class="nc">Array</span><span class="o">(</span><span class="n">label</span><span class="o">,</span> <span class="n">value</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="nc">BinarySample</span><span class="o">(</span><span class="n">label</span><span class="o">.</span><span class="n">toBoolean</span><span class="o">,</span> <span class="n">value</span><span class="o">.</span><span class="n">toDouble</span><span class="o">)</span>
 <span class="o">})</span>
 
@@ -986,7 +986,7 @@ provides streaming hypothesis testing.</p>
     <p><a href="api/java/index.html#org.apache.spark.mllib.stat.test.StreamingTest"><code>StreamingTest</code></a>
 provides streaming hypothesis testing.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.stat.test.BinarySample</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.stat.test.BinarySample</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.stat.test.StreamingTest</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.stat.test.StreamingTestResult</span><span class="o">;</span>
 
@@ -997,11 +997,11 @@ provides streaming hypothesis testing.</p>
       <span class="n">String</span><span class="o">[]</span> <span class="n">ts</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="na">split</span><span class="o">(</span><span class="s">&quot;,&quot;</span><span class="o">);</span>
       <span class="kt">boolean</span> <span class="n">label</span> <span class="o">=</span> <span class="n">Boolean</span><span class="o">.</span><span class="na">parseBoolean</span><span class="o">(</span><span class="n">ts</span><span class="o">[</span><span class="mi">0</span><span class="o">]);</span>
       <span class="kt">double</span> <span class="n">value</span> <span class="o">=</span> <span class="n">Double</span><span class="o">.</span><span class="na">parseDouble</span><span class="o">(</span><span class="n">ts</span><span class="o">[</span><span class="mi">1</span><span class="o">]);</span>
-      <span class="k">return</span> <span class="k">new</span> <span class="nf">BinarySample</span><span class="o">(</span><span class="n">label</span><span class="o">,</span> <span class="n">value</span><span class="o">);</span>
+      <span class="k">return</span> <span class="k">new</span> <span class="n">BinarySample</span><span class="o">(</span><span class="n">label</span><span class="o">,</span> <span class="n">value</span><span class="o">);</span>
     <span class="o">}</span>
   <span class="o">});</span>
 
-<span class="n">StreamingTest</span> <span class="n">streamingTest</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StreamingTest</span><span class="o">()</span>
+<span class="n">StreamingTest</span> <span class="n">streamingTest</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StreamingTest</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setPeacePeriod</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setWindowSize</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setTestMethod</span><span class="o">(</span><span class="s">&quot;welch&quot;</span><span class="o">);</span>
@@ -1028,7 +1028,7 @@ distribution <code>N(0, 1)</code>, and then map it to <code>N(1, 4)</code>.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.random.RandomRDDs$"><code>RandomRDDs</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.SparkContext</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.SparkContext</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.random.RandomRDDs._</span>
 
 <span class="k">val</span> <span class="n">sc</span><span class="k">:</span> <span class="kt">SparkContext</span> <span class="o">=</span> <span class="o">...</span>
@@ -1037,7 +1037,7 @@ distribution <code>N(0, 1)</code>, and then map it to <code>N(1, 4)</code>.</p>
 <span class="c1">// standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.</span>
 <span class="k">val</span> <span class="n">u</span> <span class="k">=</span> <span class="n">normalRDD</span><span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="mi">1000000L</span><span class="o">,</span> <span class="mi">10</span><span class="o">)</span>
 <span class="c1">// Apply a transform to get a random double RDD following `N(1, 4)`.</span>
-<span class="k">val</span> <span class="n">v</span> <span class="k">=</span> <span class="n">u</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">x</span> <span class="k">=&gt;</span> <span class="mf">1.0</span> <span class="o">+</span> <span class="mf">2.0</span> <span class="o">*</span> <span class="n">x</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">v</span> <span class="k">=</span> <span class="n">u</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">x</span> <span class="k">=&gt;</span> <span class="mf">1.0</span> <span class="o">+</span> <span class="mf">2.0</span> <span class="o">*</span> <span class="n">x</span><span class="o">)</span></code></pre></figure>
 
   </div>
 
@@ -1049,9 +1049,9 @@ distribution <code>N(0, 1)</code>, and then map it to <code>N(1, 4)</code>.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/random/RandomRDDs"><code>RandomRDDs</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.SparkContext</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.SparkContext</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.JavaDoubleRDD</span><span class="o">;</span>
-<span class="kn">import</span> <span class="nn">static</span> <span class="n">org</span><span class="o">.</span><span class="na">apache</span><span class="o">.</span><span class="na">spark</span><span class="o">.</span><span class="na">mllib</span><span class="o">.</span><span class="na">random</span><span class="o">.</span><span class="na">RandomRDDs</span><span class="o">.*;</span>
+<span class="kn">import static</span> <span class="nn">org.apache.spark.mllib.random.RandomRDDs.*</span><span class="o">;</span>
 
 <span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="o">...</span>
 
@@ -1064,7 +1064,7 @@ distribution <code>N(0, 1)</code>, and then map it to <code>N(1, 4)</code>.</p>
     <span class="kd">public</span> <span class="n">Double</span> <span class="nf">call</span><span class="o">(</span><span class="n">Double</span> <span class="n">x</span><span class="o">)</span> <span class="o">{</span>
       <span class="k">return</span> <span class="mf">1.0</span> <span class="o">+</span> <span class="mf">2.0</span> <span class="o">*</span> <span class="n">x</span><span class="o">;</span>
     <span class="o">}</span>
-  <span class="o">});</span></code></pre></div>
+  <span class="o">});</span></code></pre></figure>
 
   </div>
 
@@ -1076,15 +1076,15 @@ distribution <code>N(0, 1)</code>, and then map it to <code>N(1, 4)</code>.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.random.RandomRDDs"><code>RandomRDDs</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pyspark.mllib.random</span> <span class="kn">import</span> <span class="n">RandomRDDs</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.random</span> <span class="kn">import</span> <span class="n">RandomRDDs</span>
 
-<span class="n">sc</span> <span class="o">=</span> <span class="o">...</span> <span class="c"># SparkContext</span>
+<span class="n">sc</span> <span class="o">=</span> <span class="o">...</span> <span class="c1"># SparkContext</span>
 
-<span class="c"># Generate a random double RDD that contains 1 million i.i.d. values drawn from the</span>
-<span class="c"># standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.</span>
+<span class="c1"># Generate a random double RDD that contains 1 million i.i.d. values drawn from the</span>
+<span class="c1"># standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.</span>
 <span class="n">u</span> <span class="o">=</span> <span class="n">RandomRDDs</span><span class="o">.</span><span class="n">normalRDD</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="il">1000000L</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
-<span class="c"># Apply a transform to get a random double RDD following `N(1, 4)`.</span>
-<span class="n">v</span> <span class="o">=</span> <span class="n">u</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="mf">1.0</span> <span class="o">+</span> <span class="mf">2.0</span> <span class="o">*</span> <span class="n">x</span><span class="p">)</span></code></pre></div>
+<span class="c1"># Apply a transform to get a random double RDD following `N(1, 4)`.</span>
+<span class="n">v</span> <span class="o">=</span> <span class="n">u</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="mf">1.0</span> <span class="o">+</span> <span class="mf">2.0</span> <span class="o">*</span> <span class="n">x</span><span class="p">)</span></code></pre></figure>
 
   </div>
 </div>
@@ -1107,7 +1107,7 @@ to do so.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.stat.KernelDensity"><code>KernelDensity</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.stat.KernelDensity</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.stat.KernelDensity</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.rdd.RDD</span>
 
 <span class="c1">// an RDD of sample data</span>
@@ -1132,7 +1132,7 @@ to do so.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/stat/KernelDensity.html"><code>KernelDensity</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.stat.KernelDensity</span><span class="o">;</span>
@@ -1143,7 +1143,7 @@ to do so.</p>
 
 <span class="c1">// Construct the density estimator with the sample data</span>
 <span class="c1">// and a standard deviation for the Gaussian kernels</span>
-<span class="n">KernelDensity</span> <span class="n">kd</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">KernelDensity</span><span class="o">().</span><span class="na">setSample</span><span class="o">(</span><span class="n">data</span><span class="o">).</span><span class="na">setBandwidth</span><span class="o">(</span><span class="mf">3.0</span><span class="o">);</span>
+<span class="n">KernelDensity</span> <span class="n">kd</span> <span class="o">=</span> <span class="k">new</span> <span class="n">KernelDensity</span><span class="o">().</span><span class="na">setSample</span><span class="o">(</span><span class="n">data</span><span class="o">).</span><span class="na">setBandwidth</span><span class="o">(</span><span class="mf">3.0</span><span class="o">);</span>
 
 <span class="c1">// Find density estimates for the given values</span>
 <span class="kt">double</span><span class="o">[]</span> <span class="n">densities</span> <span class="o">=</span> <span class="n">kd</span><span class="o">.</span><span class="na">estimate</span><span class="o">(</span><span class="k">new</span> <span class="kt">double</span><span class="o">[]{-</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">2.0</span><span class="o">,</span> <span class="mf">5.0</span><span class="o">});</span>
@@ -1160,18 +1160,18 @@ to do so.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.stat.KernelDensity"><code>KernelDensity</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.stat</span> <span class="kn">import</span> <span class="n">KernelDensity</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.stat</span> <span class="kn">import</span> <span class="n">KernelDensity</span>
 
-<span class="c"># an RDD of sample data</span>
+<span class="c1"># an RDD of sample data</span>
 <span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">,</span> <span class="mf">4.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">,</span> <span class="mf">6.0</span><span class="p">,</span> <span class="mf">7.0</span><span class="p">,</span> <span class="mf">8.0</span><span class="p">,</span> <span class="mf">9.0</span><span class="p">,</span> <span class="mf">9.0</span><span class="p">])</span>
 
-<span class="c"># Construct the density estimator with the sample data and a standard deviation for the Gaussian</span>
-<span class="c"># kernels</span>
+<span class="c1"># Construct the density estimator with the sample data and a standard deviation for the Gaussian</span>
+<span class="c1"># kernels</span>
 <span class="n">kd</span> <span class="o">=</span> <span class="n">KernelDensity</span><span class="p">()</span>
 <span class="n">kd</span><span class="o">.</span><span class="n">setSample</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
 <span class="n">kd</span><span class="o">.</span><span class="n">setBandwidth</span><span class="p">(</span><span class="mf">3.0</span><span class="p">)</span>
 
-<span class="c"># Find density estimates for the given values</span>
+<span class="c1"># Find density estimates for the given values</span>
 <span class="n">densities</span> <span class="o">=</span> <span class="n">kd</span><span class="o">.</span><span class="n">estimate</span><span class="p">([</span><span class="o">-</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">])</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/kernel_density_estimation_example.py" in the Spark repo.</small></div>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[11/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-linear-methods.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-linear-methods.html b/site/docs/2.1.0/mllib-linear-methods.html
index 46a1a25..428d778 100644
--- a/site/docs/2.1.0/mllib-linear-methods.html
+++ b/site/docs/2.1.0/mllib-linear-methods.html
@@ -307,23 +307,23 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#mathematical-formulation" id="markdown-toc-mathematical-formulation">Mathematical formulation</a>    <ul>
-      <li><a href="#loss-functions" id="markdown-toc-loss-functions">Loss functions</a></li>
-      <li><a href="#regularizers" id="markdown-toc-regularizers">Regularizers</a></li>
-      <li><a href="#optimization" id="markdown-toc-optimization">Optimization</a></li>
+  <li><a href="#mathematical-formulation">Mathematical formulation</a>    <ul>
+      <li><a href="#loss-functions">Loss functions</a></li>
+      <li><a href="#regularizers">Regularizers</a></li>
+      <li><a href="#optimization">Optimization</a></li>
     </ul>
   </li>
-  <li><a href="#classification" id="markdown-toc-classification">Classification</a>    <ul>
-      <li><a href="#linear-support-vector-machines-svms" id="markdown-toc-linear-support-vector-machines-svms">Linear Support Vector Machines (SVMs)</a></li>
-      <li><a href="#logistic-regression" id="markdown-toc-logistic-regression">Logistic regression</a></li>
+  <li><a href="#classification">Classification</a>    <ul>
+      <li><a href="#linear-support-vector-machines-svms">Linear Support Vector Machines (SVMs)</a></li>
+      <li><a href="#logistic-regression">Logistic regression</a></li>
     </ul>
   </li>
-  <li><a href="#regression" id="markdown-toc-regression">Regression</a>    <ul>
-      <li><a href="#linear-least-squares-lasso-and-ridge-regression" id="markdown-toc-linear-least-squares-lasso-and-ridge-regression">Linear least squares, Lasso, and ridge regression</a></li>
-      <li><a href="#streaming-linear-regression" id="markdown-toc-streaming-linear-regression">Streaming linear regression</a></li>
+  <li><a href="#regression">Regression</a>    <ul>
+      <li><a href="#linear-least-squares-lasso-and-ridge-regression">Linear least squares, Lasso, and ridge regression</a></li>
+      <li><a href="#streaming-linear-regression">Streaming linear regression</a></li>
     </ul>
   </li>
-  <li><a href="#implementation-developer" id="markdown-toc-implementation-developer">Implementation (developer)</a></li>
+  <li><a href="#implementation-developer">Implementation (developer)</a></li>
 </ul>
 
 <p><code>\[
@@ -489,7 +489,7 @@ error.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.classification.SVMWithSGD"><code>SVMWithSGD</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.classification.SVMModel"><code>SVMModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.</span><span class="o">{</span><span class="nc">SVMModel</span><span class="o">,</span> <span class="nc">SVMWithSGD</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.</span><span class="o">{</span><span class="nc">SVMModel</span><span class="o">,</span> <span class="nc">SVMWithSGD</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.evaluation.BinaryClassificationMetrics</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
 
@@ -534,14 +534,14 @@ this way as well. For example, the following code produces an L1 regularized
 variant of SVMs with regularization parameter set to 0.1, and runs the training
 algorithm for 200 iterations.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.mllib.optimization.L1Updater</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.optimization.L1Updater</span>
 
 <span class="k">val</span> <span class="n">svmAlg</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SVMWithSGD</span><span class="o">()</span>
 <span class="n">svmAlg</span><span class="o">.</span><span class="n">optimizer</span>
   <span class="o">.</span><span class="n">setNumIterations</span><span class="o">(</span><span class="mi">200</span><span class="o">)</span>
   <span class="o">.</span><span class="n">setRegParam</span><span class="o">(</span><span class="mf">0.1</span><span class="o">)</span>
   <span class="o">.</span><span class="n">setUpdater</span><span class="o">(</span><span class="k">new</span> <span class="n">L1Updater</span><span class="o">)</span>
-<span class="k">val</span> <span class="n">modelL1</span> <span class="k">=</span> <span class="n">svmAlg</span><span class="o">.</span><span class="n">run</span><span class="o">(</span><span class="n">training</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">modelL1</span> <span class="k">=</span> <span class="n">svmAlg</span><span class="o">.</span><span class="n">run</span><span class="o">(</span><span class="n">training</span><span class="o">)</span></code></pre></figure>
 
   </div>
 
@@ -554,7 +554,7 @@ that is equivalent to the provided example in Scala is given below:</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/classification/SVMWithSGD.html"><code>SVMWithSGD</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/classification/SVMModel.html"><code>SVMModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
@@ -591,7 +591,7 @@ that is equivalent to the provided example in Scala is given below:</p>
 
 <span class="c1">// Get evaluation metrics.</span>
 <span class="n">BinaryClassificationMetrics</span> <span class="n">metrics</span> <span class="o">=</span>
-  <span class="k">new</span> <span class="nf">BinaryClassificationMetrics</span><span class="o">(</span><span class="n">JavaRDD</span><span class="o">.</span><span class="na">toRDD</span><span class="o">(</span><span class="n">scoreAndLabels</span><span class="o">));</span>
+  <span class="k">new</span> <span class="n">BinaryClassificationMetrics</span><span class="o">(</span><span class="n">JavaRDD</span><span class="o">.</span><span class="na">toRDD</span><span class="o">(</span><span class="n">scoreAndLabels</span><span class="o">));</span>
 <span class="kt">double</span> <span class="n">auROC</span> <span class="o">=</span> <span class="n">metrics</span><span class="o">.</span><span class="na">areaUnderROC</span><span class="o">();</span>
 
 <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;Area under ROC = &quot;</span> <span class="o">+</span> <span class="n">auROC</span><span class="o">);</span>
@@ -610,14 +610,14 @@ this way as well. For example, the following code produces an L1 regularized
 variant of SVMs with regularization parameter set to 0.1, and runs the training
 algorithm for 200 iterations.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.optimization.L1Updater</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.optimization.L1Updater</span><span class="o">;</span>
 
-<span class="n">SVMWithSGD</span> <span class="n">svmAlg</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">SVMWithSGD</span><span class="o">();</span>
+<span class="n">SVMWithSGD</span> <span class="n">svmAlg</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SVMWithSGD</span><span class="o">();</span>
 <span class="n">svmAlg</span><span class="o">.</span><span class="na">optimizer</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setNumIterations</span><span class="o">(</span><span class="mi">200</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setRegParam</span><span class="o">(</span><span class="mf">0.1</span><span class="o">)</span>
-  <span class="o">.</span><span class="na">setUpdater</span><span class="o">(</span><span class="k">new</span> <span class="nf">L1Updater</span><span class="o">());</span>
-<span class="kd">final</span> <span class="n">SVMModel</span> <span class="n">modelL1</span> <span class="o">=</span> <span class="n">svmAlg</span><span class="o">.</span><span class="na">run</span><span class="o">(</span><span class="n">training</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span></code></pre></div>
+  <span class="o">.</span><span class="na">setUpdater</span><span class="o">(</span><span class="k">new</span> <span class="n">L1Updater</span><span class="o">());</span>
+<span class="kd">final</span> <span class="n">SVMModel</span> <span class="n">modelL1</span> <span class="o">=</span> <span class="n">svmAlg</span><span class="o">.</span><span class="na">run</span><span class="o">(</span><span class="n">training</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span></code></pre></figure>
 
     <p>In order to run the above application, follow the instructions
 provided in the <a href="quick-start.html#self-contained-applications">Self-Contained
@@ -632,28 +632,28 @@ and make predictions with the resulting model to compute the training error.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.classification.SVMWithSGD"><code>SVMWithSGD</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.classification.SVMModel"><code>SVMModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.classification</span> <span class="kn">import</span> <span class="n">SVMWithSGD</span><span class="p">,</span> <span class="n">SVMModel</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.classification</span> <span class="kn">import</span> <span class="n">SVMWithSGD</span><span class="p">,</span> <span class="n">SVMModel</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.regression</span> <span class="kn">import</span> <span class="n">LabeledPoint</span>
 
-<span class="c"># Load and parse the data</span>
+<span class="c1"># Load and parse the data</span>
 <span class="k">def</span> <span class="nf">parsePoint</span><span class="p">(</span><span class="n">line</span><span class="p">):</span>
-    <span class="n">values</span> <span class="o">=</span> <span class="p">[</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)]</span>
+    <span class="n">values</span> <span class="o">=</span> <span class="p">[</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)]</span>
     <span class="k">return</span> <span class="n">LabeledPoint</span><span class="p">(</span><span class="n">values</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">values</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span>
 
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_svm_data.txt&quot;</span><span class="p">)</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_svm_data.txt&quot;</span><span class="p">)</span>
 <span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">parsePoint</span><span class="p">)</span>
 
-<span class="c"># Build the model</span>
+<span class="c1"># Build the model</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">SVMWithSGD</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">parsedData</span><span class="p">,</span> <span class="n">iterations</span><span class="o">=</span><span class="mi">100</span><span class="p">)</span>
 
-<span class="c"># Evaluating the model on training data</span>
+<span class="c1"># Evaluating the model on training data</span>
 <span class="n">labelsAndPreds</span> <span class="o">=</span> <span class="n">parsedData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="p">(</span><span class="n">p</span><span class="o">.</span><span class="n">label</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">p</span><span class="o">.</span><span class="n">features</span><span class="p">)))</span>
 <span class="n">trainErr</span> <span class="o">=</span> <span class="n">labelsAndPreds</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">p</span><span class="p">):</span> <span class="n">v</span> <span class="o">!=</span> <span class="n">p</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span> <span class="o">/</span> <span class="nb">float</span><span class="p">(</span><span class="n">parsedData</span><span class="o">.</span><span class="n">count</span><span class="p">())</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Training Error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">trainErr</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Training Error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">trainErr</span><span class="p">))</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/pythonSVMWithSGDModel&quot;</span><span class="p">)</span>
-<span class="n">sameModel</span> <span class="o">=</span> <span class="n">SVMModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/pythonSVMWithSGDModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/pythonSVMWithSGDModel&quot;</span><span class="p">)</span>
+<span class="n">sameModel</span> <span class="o">=</span> <span class="n">SVMModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/pythonSVMWithSGDModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/svm_with_sgd_example.py" in the Spark repo.</small></div>
   </div>
@@ -713,7 +713,7 @@ Then the model is evaluated against the test dataset and saved to disk.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS"><code>LogisticRegressionWithLBFGS</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionModel"><code>LogisticRegressionModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.</span><span class="o">{</span><span class="nc">LogisticRegressionModel</span><span class="o">,</span> <span class="nc">LogisticRegressionWithLBFGS</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.</span><span class="o">{</span><span class="nc">LogisticRegressionModel</span><span class="o">,</span> <span class="nc">LogisticRegressionWithLBFGS</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.evaluation.MulticlassMetrics</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
@@ -740,7 +740,7 @@ Then the model is evaluated against the test dataset and saved to disk.</p>
 <span class="c1">// Get evaluation metrics.</span>
 <span class="k">val</span> <span class="n">metrics</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">MulticlassMetrics</span><span class="o">(</span><span class="n">predictionAndLabels</span><span class="o">)</span>
 <span class="k">val</span> <span class="n">accuracy</span> <span class="k">=</span> <span class="n">metrics</span><span class="o">.</span><span class="n">accuracy</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Accuracy = $accuracy&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Accuracy = </span><span class="si">$accuracy</span><span class="s">&quot;</span><span class="o">)</span>
 
 <span class="c1">// Save and load model</span>
 <span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="s">&quot;target/tmp/scalaLogisticRegressionWithLBFGSModel&quot;</span><span class="o">)</span>
@@ -760,7 +760,7 @@ Then the model is evaluated against the test dataset and saved to disk.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/classification/LogisticRegressionWithLBFGS.html"><code>LogisticRegressionWithLBFGS</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/classification/LogisticRegressionModel.html"><code>LogisticRegressionModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
@@ -779,7 +779,7 @@ Then the model is evaluated against the test dataset and saved to disk.</p>
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">LabeledPoint</span><span class="o">&gt;</span> <span class="n">test</span> <span class="o">=</span> <span class="n">splits</span><span class="o">[</span><span class="mi">1</span><span class="o">];</span>
 
 <span class="c1">// Run training algorithm to build the model.</span>
-<span class="kd">final</span> <span class="n">LogisticRegressionModel</span> <span class="n">model</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LogisticRegressionWithLBFGS</span><span class="o">()</span>
+<span class="kd">final</span> <span class="n">LogisticRegressionModel</span> <span class="n">model</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LogisticRegressionWithLBFGS</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setNumClasses</span><span class="o">(</span><span class="mi">10</span><span class="o">)</span>
   <span class="o">.</span><span class="na">run</span><span class="o">(</span><span class="n">training</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 
@@ -794,7 +794,7 @@ Then the model is evaluated against the test dataset and saved to disk.</p>
 <span class="o">);</span>
 
 <span class="c1">// Get evaluation metrics.</span>
-<span class="n">MulticlassMetrics</span> <span class="n">metrics</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">MulticlassMetrics</span><span class="o">(</span><span class="n">predictionAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
+<span class="n">MulticlassMetrics</span> <span class="n">metrics</span> <span class="o">=</span> <span class="k">new</span> <span class="n">MulticlassMetrics</span><span class="o">(</span><span class="n">predictionAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 <span class="kt">double</span> <span class="n">accuracy</span> <span class="o">=</span> <span class="n">metrics</span><span class="o">.</span><span class="na">accuracy</span><span class="o">();</span>
 <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;Accuracy = &quot;</span> <span class="o">+</span> <span class="n">accuracy</span><span class="o">);</span>
 
@@ -815,29 +815,29 @@ will in the future.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.classification.LogisticRegressionWithLBFGS"><code>LogisticRegressionWithLBFGS</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.classification.LogisticRegressionModel"><code>LogisticRegressionModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.classification</span> <span class="kn">import</span> <span class="n">LogisticRegressionWithLBFGS</span><span class="p">,</span> <span class="n">LogisticRegressionModel</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.classification</span> <span class="kn">import</span> <span class="n">LogisticRegressionWithLBFGS</span><span class="p">,</span> <span class="n">LogisticRegressionModel</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.regression</span> <span class="kn">import</span> <span class="n">LabeledPoint</span>
 
-<span class="c"># Load and parse the data</span>
+<span class="c1"># Load and parse the data</span>
 <span class="k">def</span> <span class="nf">parsePoint</span><span class="p">(</span><span class="n">line</span><span class="p">):</span>
-    <span class="n">values</span> <span class="o">=</span> <span class="p">[</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)]</span>
+    <span class="n">values</span> <span class="o">=</span> <span class="p">[</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)]</span>
     <span class="k">return</span> <span class="n">LabeledPoint</span><span class="p">(</span><span class="n">values</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">values</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span>
 
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_svm_data.txt&quot;</span><span class="p">)</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_svm_data.txt&quot;</span><span class="p">)</span>
 <span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">parsePoint</span><span class="p">)</span>
 
-<span class="c"># Build the model</span>
+<span class="c1"># Build the model</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">LogisticRegressionWithLBFGS</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">parsedData</span><span class="p">)</span>
 
-<span class="c"># Evaluating the model on training data</span>
+<span class="c1"># Evaluating the model on training data</span>
 <span class="n">labelsAndPreds</span> <span class="o">=</span> <span class="n">parsedData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="p">(</span><span class="n">p</span><span class="o">.</span><span class="n">label</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">p</span><span class="o">.</span><span class="n">features</span><span class="p">)))</span>
 <span class="n">trainErr</span> <span class="o">=</span> <span class="n">labelsAndPreds</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">p</span><span class="p">):</span> <span class="n">v</span> <span class="o">!=</span> <span class="n">p</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span> <span class="o">/</span> <span class="nb">float</span><span class="p">(</span><span class="n">parsedData</span><span class="o">.</span><span class="n">count</span><span class="p">())</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Training Error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">trainErr</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Training Error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">trainErr</span><span class="p">))</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/pythonLogisticRegressionWithLBFGSModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/pythonLogisticRegressionWithLBFGSModel&quot;</span><span class="p">)</span>
 <span class="n">sameModel</span> <span class="o">=</span> <span class="n">LogisticRegressionModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span>
-                                         <span class="s">&quot;target/tmp/pythonLogisticRegressionWithLBFGSModel&quot;</span><span class="p">)</span>
+                                         <span class="s2">&quot;target/tmp/pythonLogisticRegressionWithLBFGSModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/logistic_regression_with_lbfgs_example.py" in the Spark repo.</small></div>
   </div>
@@ -874,7 +874,7 @@ values. We compute the mean squared error at the end to evaluate
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.regression.LinearRegressionWithSGD"><code>LinearRegressionWithSGD</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.regression.LinearRegressionModel"><code>LinearRegressionModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LinearRegressionModel</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LinearRegressionWithSGD</span>
@@ -919,7 +919,7 @@ the Scala snippet provided, is presented below:</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.html"><code>LinearRegressionWithSGD</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/regression/LinearRegressionModel.html"><code>LinearRegressionModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaDoubleRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
@@ -941,7 +941,7 @@ the Scala snippet provided, is presented below:</p>
       <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">features</span><span class="o">.</span><span class="na">length</span> <span class="o">-</span> <span class="mi">1</span><span class="o">;</span> <span class="n">i</span><span class="o">++)</span> <span class="o">{</span>
         <span class="n">v</span><span class="o">[</span><span class="n">i</span><span class="o">]</span> <span class="o">=</span> <span class="n">Double</span><span class="o">.</span><span class="na">parseDouble</span><span class="o">(</span><span class="n">features</span><span class="o">[</span><span class="n">i</span><span class="o">]);</span>
       <span class="o">}</span>
-      <span class="k">return</span> <span class="k">new</span> <span class="nf">LabeledPoint</span><span class="o">(</span><span class="n">Double</span><span class="o">.</span><span class="na">parseDouble</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="mi">0</span><span class="o">]),</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="n">v</span><span class="o">));</span>
+      <span class="k">return</span> <span class="k">new</span> <span class="n">LabeledPoint</span><span class="o">(</span><span class="n">Double</span><span class="o">.</span><span class="na">parseDouble</span><span class="o">(</span><span class="n">parts</span><span class="o">[</span><span class="mi">0</span><span class="o">]),</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="n">v</span><span class="o">));</span>
     <span class="o">}</span>
   <span class="o">}</span>
 <span class="o">);</span>
@@ -962,7 +962,7 @@ the Scala snippet provided, is presented below:</p>
     <span class="o">}</span>
   <span class="o">}</span>
 <span class="o">);</span>
-<span class="kt">double</span> <span class="n">MSE</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JavaDoubleRDD</span><span class="o">(</span><span class="n">valuesAndPreds</span><span class="o">.</span><span class="na">map</span><span class="o">(</span>
+<span class="kt">double</span> <span class="n">MSE</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaDoubleRDD</span><span class="o">(</span><span class="n">valuesAndPreds</span><span class="o">.</span><span class="na">map</span><span class="o">(</span>
   <span class="k">new</span> <span class="n">Function</span><span class="o">&lt;</span><span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">Double</span><span class="o">,</span> <span class="n">Double</span><span class="o">&gt;,</span> <span class="n">Object</span><span class="o">&gt;()</span> <span class="o">{</span>
     <span class="kd">public</span> <span class="n">Object</span> <span class="nf">call</span><span class="o">(</span><span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">Double</span><span class="o">,</span> <span class="n">Double</span><span class="o">&gt;</span> <span class="n">pair</span><span class="o">)</span> <span class="o">{</span>
       <span class="k">return</span> <span class="n">Math</span><span class="o">.</span><span class="na">pow</span><span class="o">(</span><span class="n">pair</span><span class="o">.</span><span class="na">_1</span><span class="o">()</span> <span class="o">-</span> <span class="n">pair</span><span class="o">.</span><span class="na">_2</span><span class="o">(),</span> <span class="mf">2.0</span><span class="o">);</span>
@@ -989,29 +989,29 @@ values. We compute the mean squared error at the end to evaluate
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.regression.LinearRegressionWithSGD"><code>LinearRegressionWithSGD</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.regression.LinearRegressionModel"><code>LinearRegressionModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.regression</span> <span class="kn">import</span> <span class="n">LabeledPoint</span><span class="p">,</span> <span class="n">LinearRegressionWithSGD</span><span class="p">,</span> <span class="n">LinearRegressionModel</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.regression</span> <span class="kn">import</span> <span class="n">LabeledPoint</span><span class="p">,</span> <span class="n">LinearRegressionWithSGD</span><span class="p">,</span> <span class="n">LinearRegressionModel</span>
 
-<span class="c"># Load and parse the data</span>
+<span class="c1"># Load and parse the data</span>
 <span class="k">def</span> <span class="nf">parsePoint</span><span class="p">(</span><span class="n">line</span><span class="p">):</span>
-    <span class="n">values</span> <span class="o">=</span> <span class="p">[</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s">&#39;,&#39;</span><span class="p">,</span> <span class="s">&#39; &#39;</span><span class="p">)</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)]</span>
+    <span class="n">values</span> <span class="o">=</span> <span class="p">[</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span><span class="s1">&#39;,&#39;</span><span class="p">,</span> <span class="s1">&#39; &#39;</span><span class="p">)</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)]</span>
     <span class="k">return</span> <span class="n">LabeledPoint</span><span class="p">(</span><span class="n">values</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">values</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span>
 
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/ridge-data/lpsa.data&quot;</span><span class="p">)</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/ridge-data/lpsa.data&quot;</span><span class="p">)</span>
 <span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">parsePoint</span><span class="p">)</span>
 
-<span class="c"># Build the model</span>
+<span class="c1"># Build the model</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">LinearRegressionWithSGD</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">parsedData</span><span class="p">,</span> <span class="n">iterations</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span> <span class="n">step</span><span class="o">=</span><span class="mf">0.00000001</span><span class="p">)</span>
 
-<span class="c"># Evaluate the model on training data</span>
+<span class="c1"># Evaluate the model on training data</span>
 <span class="n">valuesAndPreds</span> <span class="o">=</span> <span class="n">parsedData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="p">(</span><span class="n">p</span><span class="o">.</span><span class="n">label</span><span class="p">,</span> <span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">p</span><span class="o">.</span><span class="n">features</span><span class="p">)))</span>
 <span class="n">MSE</span> <span class="o">=</span> <span class="n">valuesAndPreds</span> \
     <span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">p</span><span class="p">):</span> <span class="p">(</span><span class="n">v</span> <span class="o">-</span> <span class="n">p</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> \
     <span class="o">.</span><span class="n">reduce</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">)</span> <span class="o">/</span> <span class="n">valuesAndPreds</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Mean Squared Error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">MSE</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Mean Squared Error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">MSE</span><span class="p">))</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/pythonLinearRegressionWithSGDModel&quot;</span><span class="p">)</span>
-<span class="n">sameModel</span> <span class="o">=</span> <span class="n">LinearRegressionModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/pythonLinearRegressionWithSGDModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/pythonLinearRegressionWithSGDModel&quot;</span><span class="p">)</span>
+<span class="n">sameModel</span> <span class="o">=</span> <span class="n">LinearRegressionModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/pythonLinearRegressionWithSGDModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/linear_regression_with_sgd_example.py" in the Spark repo.</small></div>
   </div>
@@ -1059,25 +1059,24 @@ the model will update. Anytime a text file is placed in <code>args(1)</code> you
 As you feed more data to the training directory, the predictions
 will get better!</p>
 
-    <p>Here is a complete example:</p>
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
+    <p>Here is a complete example:
+&lt;div class="highlight"&gt;&lt;pre&gt;<span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span>
-<span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD</span>
+<span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD</span></p>
 
-<span class="k">val</span> <span class="n">trainingData</span> <span class="k">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">textFileStream</span><span class="o">(</span><span class="n">args</span><span class="o">(</span><span class="mi">0</span><span class="o">)).</span><span class="n">map</span><span class="o">(</span><span class="nc">LabeledPoint</span><span class="o">.</span><span class="n">parse</span><span class="o">).</span><span class="n">cache</span><span class="o">()</span>
-<span class="k">val</span> <span class="n">testData</span> <span class="k">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">textFileStream</span><span class="o">(</span><span class="n">args</span><span class="o">(</span><span class="mi">1</span><span class="o">)).</span><span class="n">map</span><span class="o">(</span><span class="nc">LabeledPoint</span><span class="o">.</span><span class="n">parse</span><span class="o">)</span>
+    <p><span class="k">val</span> <span class="n">trainingData</span> <span class="k">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">textFileStream</span><span class="o">(</span><span class="n">args</span><span class="o">(</span><span class="mi">0</span><span class="o">)).</span><span class="n">map</span><span class="o">(</span><span class="nc">LabeledPoint</span><span class="o">.</span><span class="n">parse</span><span class="o">).</span><span class="n">cache</span><span class="o">()</span>
+<span class="k">val</span> <span class="n">testData</span> <span class="k">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">textFileStream</span><span class="o">(</span><span class="n">args</span><span class="o">(</span><span class="mi">1</span><span class="o">)).</span><span class="n">map</span><span class="o">(</span><span class="nc">LabeledPoint</span><span class="o">.</span><span class="n">parse</span><span class="o">)</span></p>
 
-<span class="k">val</span> <span class="n">numFeatures</span> <span class="k">=</span> <span class="mi">3</span>
+    <p><span class="k">val</span> <span class="n">numFeatures</span> <span class="k">=</span> <span class="mi">3</span>
 <span class="k">val</span> <span class="n">model</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">StreamingLinearRegressionWithSGD</span><span class="o">()</span>
-  <span class="o">.</span><span class="n">setInitialWeights</span><span class="o">(</span><span class="nc">Vectors</span><span class="o">.</span><span class="n">zeros</span><span class="o">(</span><span class="n">numFeatures</span><span class="o">))</span>
+  <span class="o">.</span><span class="n">setInitialWeights</span><span class="o">(</span><span class="nc">Vectors</span><span class="o">.</span><span class="n">zeros</span><span class="o">(</span><span class="n">numFeatures</span><span class="o">))</span></p>
 
-<span class="n">model</span><span class="o">.</span><span class="n">trainOn</span><span class="o">(</span><span class="n">trainingData</span><span class="o">)</span>
-<span class="n">model</span><span class="o">.</span><span class="n">predictOnValues</span><span class="o">(</span><span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">lp</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="o">,</span> <span class="n">lp</span><span class="o">.</span><span class="n">features</span><span class="o">))).</span><span class="n">print</span><span class="o">()</span>
+    <p><span class="n">model</span><span class="o">.</span><span class="n">trainOn</span><span class="o">(</span><span class="n">trainingData</span><span class="o">)</span>
+<span class="n">model</span><span class="o">.</span><span class="n">predictOnValues</span><span class="o">(</span><span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">lp</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="o">,</span> <span class="n">lp</span><span class="o">.</span><span class="n">features</span><span class="o">))).</span><span class="n">print</span><span class="o">()</span></p>
 
-<span class="n">ssc</span><span class="o">.</span><span class="n">start</span><span class="o">()</span>
+    <p><span class="n">ssc</span><span class="o">.</span><span class="n">start</span><span class="o">()</span>
 <span class="n">ssc</span><span class="o">.</span><span class="n">awaitTermination</span><span class="o">()</span>
-</pre></div>
-    <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/StreamingLinearRegressionExample.scala" in the Spark repo.</small></div>
+&lt;/pre&gt;&lt;/div&gt;&lt;div&gt;<small>Find full example code at &#8220;examples/src/main/scala/org/apache/spark/examples/mllib/StreamingLinearRegressionExample.scala&#8221; in the Spark repo.</small>&lt;/div&gt;</p>
 
   </div>
 
@@ -1101,32 +1100,31 @@ the model will update. Anytime a text file is placed in <code>sys.argv[2]</code>
 As you feed more data to the training directory, the predictions
 will get better!</p>
 
-    <p>Here a complete example:</p>
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">sys</span>
+    <p>Here a complete example:
+&lt;div class="highlight"&gt;&lt;pre&gt;<span></span><span class="kn">import</span> <span class="nn">sys</span></p>
 
-<span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Vectors</span>
+    <p><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Vectors</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.regression</span> <span class="kn">import</span> <span class="n">LabeledPoint</span>
-<span class="kn">from</span> <span class="nn">pyspark.mllib.regression</span> <span class="kn">import</span> <span class="n">StreamingLinearRegressionWithSGD</span>
+<span class="kn">from</span> <span class="nn">pyspark.mllib.regression</span> <span class="kn">import</span> <span class="n">StreamingLinearRegressionWithSGD</span></p>
 
-<span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="n">lp</span><span class="p">):</span>
-    <span class="n">label</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">lp</span><span class="p">[</span><span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">&#39;(&#39;</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">&#39;,&#39;</span><span class="p">)])</span>
-    <span class="n">vec</span> <span class="o">=</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">(</span><span class="n">lp</span><span class="p">[</span><span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">&#39;[&#39;</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">&#39;]&#39;</span><span class="p">)]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39;,&#39;</span><span class="p">))</span>
-    <span class="k">return</span> <span class="n">LabeledPoint</span><span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">vec</span><span class="p">)</span>
+    <p><span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="n">lp</span><span class="p">):</span>
+    <span class="n">label</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">lp</span><span class="p">[</span><span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">&#39;(&#39;</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">&#39;,&#39;</span><span class="p">)])</span>
+    <span class="n">vec</span> <span class="o">=</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">(</span><span class="n">lp</span><span class="p">[</span><span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">&#39;[&#39;</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">&#39;]&#39;</span><span class="p">)]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;,&#39;</span><span class="p">))</span>
+    <span class="k">return</span> <span class="n">LabeledPoint</span><span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">vec</span><span class="p">)</span></p>
 
-<span class="n">trainingData</span> <span class="o">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">textFileStream</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">parse</span><span class="p">)</span><span class="o">.</span><span class="n">cache</span><span class="p">()</span>
-<span class="n">testData</span> <span class="o">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">textFileStream</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">parse</span><span class="p">)</span>
+    <p><span class="n">trainingData</span> <span class="o">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">textFileStream</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">parse</span><span class="p">)</span><span class="o">.</span><span class="n">cache</span><span class="p">()</span>
+<span class="n">testData</span> <span class="o">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">textFileStream</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">parse</span><span class="p">)</span></p>
 
-<span class="n">numFeatures</span> <span class="o">=</span> <span class="mi">3</span>
+    <p><span class="n">numFeatures</span> <span class="o">=</span> <span class="mi">3</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">StreamingLinearRegressionWithSGD</span><span class="p">()</span>
-<span class="n">model</span><span class="o">.</span><span class="n">setInitialWeights</span><span class="p">([</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">])</span>
+<span class="n">model</span><span class="o">.</span><span class="n">setInitialWeights</span><span class="p">([</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">])</span></p>
 
-<span class="n">model</span><span class="o">.</span><span class="n">trainOn</span><span class="p">(</span><span class="n">trainingData</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">predictOnValues</span><span class="p">(</span><span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="p">(</span><span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">,</span> <span class="n">lp</span><span class="o">.</span><span class="n">features</span><span class="p">))))</span>
+    <p><span class="n">model</span><span class="o">.</span><span class="n">trainOn</span><span class="p">(</span><span class="n">trainingData</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">predictOnValues</span><span class="p">(</span><span class="n">testData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="p">(</span><span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">,</span> <span class="n">lp</span><span class="o">.</span><span class="n">features</span><span class="p">))))</span></p>
 
-<span class="n">ssc</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
+    <p><span class="n">ssc</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
 <span class="n">ssc</span><span class="o">.</span><span class="n">awaitTermination</span><span class="p">()</span>
-</pre></div>
-    <div><small>Find full example code at "examples/src/main/python/mllib/streaming_linear_regression_example.py" in the Spark repo.</small></div>
+&lt;/pre&gt;&lt;/div&gt;&lt;div&gt;<small>Find full example code at &#8220;examples/src/main/python/mllib/streaming_linear_regression_example.py&#8221; in the Spark repo.</small>&lt;/div&gt;</p>
 
   </div>
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-naive-bayes.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-naive-bayes.html b/site/docs/2.1.0/mllib-naive-bayes.html
index c21dd83..d843987 100644
--- a/site/docs/2.1.0/mllib-naive-bayes.html
+++ b/site/docs/2.1.0/mllib-naive-bayes.html
@@ -342,7 +342,7 @@ can be used for evaluation and prediction.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.classification.NaiveBayes"><code>NaiveBayes</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.classification.NaiveBayesModel"><code>NaiveBayesModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.</span><span class="o">{</span><span class="nc">NaiveBayes</span><span class="o">,</span> <span class="nc">NaiveBayesModel</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.</span><span class="o">{</span><span class="nc">NaiveBayes</span><span class="o">,</span> <span class="nc">NaiveBayesModel</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
 
 <span class="c1">// Load and parse the data file.</span>
@@ -373,7 +373,7 @@ can be used for evaluation and prediction.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/classification/NaiveBayes.html"><code>NaiveBayes</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/classification/NaiveBayesModel.html"><code>NaiveBayesModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.PairFunction</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaPairRDD</span><span class="o">;</span>
@@ -423,33 +423,33 @@ used for evaluation and prediction.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.classification.NaiveBayes"><code>NaiveBayes</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.classification.NaiveBayesModel"><code>NaiveBayesModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.classification</span> <span class="kn">import</span> <span class="n">NaiveBayes</span><span class="p">,</span> <span class="n">NaiveBayesModel</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.classification</span> <span class="kn">import</span> <span class="n">NaiveBayes</span><span class="p">,</span> <span class="n">NaiveBayesModel</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span>
 
 
 
-<span class="c"># Load and parse the data file.</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
+<span class="c1"># Load and parse the data file.</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
 
-<span class="c"># Split data approximately into training (60%) and test (40%)</span>
+<span class="c1"># Split data approximately into training (60%) and test (40%)</span>
 <span class="n">training</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.6</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">])</span>
 
-<span class="c"># Train a naive Bayes model.</span>
+<span class="c1"># Train a naive Bayes model.</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">NaiveBayes</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">)</span>
 
-<span class="c"># Make prediction and test accuracy.</span>
+<span class="c1"># Make prediction and test accuracy.</span>
 <span class="n">predictionAndLabel</span> <span class="o">=</span> <span class="n">test</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">p</span><span class="o">.</span><span class="n">features</span><span class="p">),</span> <span class="n">p</span><span class="o">.</span><span class="n">label</span><span class="p">))</span>
 <span class="n">accuracy</span> <span class="o">=</span> <span class="mf">1.0</span> <span class="o">*</span> <span class="n">predictionAndLabel</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">v</span><span class="p">):</span> <span class="n">x</span> <span class="o">==</span> <span class="n">v</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span> <span class="o">/</span> <span class="n">test</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&#39;model accuracy {}&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">accuracy</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s1">&#39;model accuracy {}&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">accuracy</span><span class="p">))</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">output_dir</span> <span class="o">=</span> <span class="s">&#39;target/tmp/myNaiveBayesModel&#39;</span>
+<span class="c1"># Save and load model</span>
+<span class="n">output_dir</span> <span class="o">=</span> <span class="s1">&#39;target/tmp/myNaiveBayesModel&#39;</span>
 <span class="n">shutil</span><span class="o">.</span><span class="n">rmtree</span><span class="p">(</span><span class="n">output_dir</span><span class="p">,</span> <span class="n">ignore_errors</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
 <span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="n">output_dir</span><span class="p">)</span>
 <span class="n">sameModel</span> <span class="o">=</span> <span class="n">NaiveBayesModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="n">output_dir</span><span class="p">)</span>
 <span class="n">predictionAndLabel</span> <span class="o">=</span> <span class="n">test</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="p">(</span><span class="n">sameModel</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">p</span><span class="o">.</span><span class="n">features</span><span class="p">),</span> <span class="n">p</span><span class="o">.</span><span class="n">label</span><span class="p">))</span>
 <span class="n">accuracy</span> <span class="o">=</span> <span class="mf">1.0</span> <span class="o">*</span> <span class="n">predictionAndLabel</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">v</span><span class="p">):</span> <span class="n">x</span> <span class="o">==</span> <span class="n">v</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span> <span class="o">/</span> <span class="n">test</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&#39;sameModel accuracy {}&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">accuracy</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s1">&#39;sameModel accuracy {}&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">accuracy</span><span class="p">))</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/naive_bayes_example.py" in the Spark repo.</small></div>
   </div>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-optimization.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-optimization.html b/site/docs/2.1.0/mllib-optimization.html
index 0c32f6e..74dbeba 100644
--- a/site/docs/2.1.0/mllib-optimization.html
+++ b/site/docs/2.1.0/mllib-optimization.html
@@ -331,20 +331,20 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#mathematical-description" id="markdown-toc-mathematical-description">Mathematical description</a>    <ul>
-      <li><a href="#gradient-descent" id="markdown-toc-gradient-descent">Gradient descent</a></li>
-      <li><a href="#stochastic-gradient-descent-sgd" id="markdown-toc-stochastic-gradient-descent-sgd">Stochastic gradient descent (SGD)</a></li>
-      <li><a href="#update-schemes-for-distributed-sgd" id="markdown-toc-update-schemes-for-distributed-sgd">Update schemes for distributed SGD</a></li>
-      <li><a href="#limited-memory-bfgs-l-bfgs" id="markdown-toc-limited-memory-bfgs-l-bfgs">Limited-memory BFGS (L-BFGS)</a></li>
-      <li><a href="#choosing-an-optimization-method" id="markdown-toc-choosing-an-optimization-method">Choosing an Optimization Method</a></li>
+  <li><a href="#mathematical-description">Mathematical description</a>    <ul>
+      <li><a href="#gradient-descent">Gradient descent</a></li>
+      <li><a href="#stochastic-gradient-descent-sgd">Stochastic gradient descent (SGD)</a></li>
+      <li><a href="#update-schemes-for-distributed-sgd">Update schemes for distributed SGD</a></li>
+      <li><a href="#limited-memory-bfgs-l-bfgs">Limited-memory BFGS (L-BFGS)</a></li>
+      <li><a href="#choosing-an-optimization-method">Choosing an Optimization Method</a></li>
     </ul>
   </li>
-  <li><a href="#implementation-in-mllib" id="markdown-toc-implementation-in-mllib">Implementation in MLlib</a>    <ul>
-      <li><a href="#gradient-descent-and-stochastic-gradient-descent" id="markdown-toc-gradient-descent-and-stochastic-gradient-descent">Gradient descent and stochastic gradient descent</a></li>
-      <li><a href="#l-bfgs" id="markdown-toc-l-bfgs">L-BFGS</a></li>
+  <li><a href="#implementation-in-mllib">Implementation in MLlib</a>    <ul>
+      <li><a href="#gradient-descent-and-stochastic-gradient-descent">Gradient descent and stochastic gradient descent</a></li>
+      <li><a href="#l-bfgs">L-BFGS</a></li>
     </ul>
   </li>
-  <li><a href="#developers-notes" id="markdown-toc-developers-notes">Developer&#8217;s notes</a></li>
+  <li><a href="#developers-notes">Developer&#8217;s notes</a></li>
 </ul>
 
 <p><code>\[
@@ -471,7 +471,7 @@ quadratic without evaluating the second partial derivatives of the objective fun
 Hessian matrix. The Hessian matrix is approximated by previous gradient evaluations, so there is no 
 vertical scalability issue (the number of training features) when computing the Hessian matrix 
 explicitly in Newton&#8217;s method. As a result, L-BFGS often achieves rapider convergence compared with 
-other first-order optimization.</p>
+other first-order optimization. </p>
 
 <h3 id="choosing-an-optimization-method">Choosing an Optimization Method</h3>
 
@@ -497,7 +497,7 @@ sets the following parameters:</p>
 being optimized, i.e., with respect to a single training example, at the
 current parameter value. MLlib includes gradient classes for common loss
 functions, e.g., hinge, logistic, least-squares.  The gradient class takes as
-input a training example, its label, and the current parameter value.</li>
+input a training example, its label, and the current parameter value. </li>
   <li><code>Updater</code> is a class that performs the actual gradient descent step, i.e. 
 updating the weights in each iteration, for a given gradient of the loss part.
 The updater is also responsible to perform the update from the regularization 
@@ -505,7 +505,7 @@ part. MLlib includes updaters for cases without regularization, as well as
 L1 and L2 regularizers.</li>
   <li><code>stepSize</code> is a scalar value denoting the initial step size for gradient
 descent. All updaters in MLlib use a step size at the t-th step equal to
-<code>stepSize $/ \sqrt{t}$</code>.</li>
+<code>stepSize $/ \sqrt{t}$</code>. </li>
   <li><code>numIterations</code> is the number of iterations to run.</li>
   <li><code>regParam</code> is the regularization parameter when using L1 or L2 regularization.</li>
   <li><code>miniBatchFraction</code> is the fraction of the total data that is sampled in 
@@ -521,7 +521,7 @@ each iteration, to compute the gradient direction.
 ML algorithms such as Linear Regression, and Logistic Regression, you have to pass the gradient of objective
 function, and updater into optimizer yourself instead of using the training APIs like 
 <a href="api/scala/index.html#org.apache.spark.mllib.classification.LogisticRegressionWithSGD">LogisticRegressionWithSGD</a>.
-See the example below. It will be addressed in the next release.</p>
+See the example below. It will be addressed in the next release. </p>
 
 <p>The L1 regularization by using 
 <a href="api/scala/index.html#org.apache.spark.mllib.optimization.L1Updater">L1Updater</a> will not work since the 
@@ -536,10 +536,10 @@ has the following parameters:</p>
 being optimized, i.e., with respect to a single training example, at the
 current parameter value. MLlib includes gradient classes for common loss
 functions, e.g., hinge, logistic, least-squares.  The gradient class takes as
-input a training example, its label, and the current parameter value.</li>
+input a training example, its label, and the current parameter value. </li>
   <li><code>Updater</code> is a class that computes the gradient and loss of objective function 
 of the regularization part for L-BFGS. MLlib includes updaters for cases without 
-regularization, as well as L2 regularizer.</li>
+regularization, as well as L2 regularizer. </li>
   <li><code>numCorrections</code> is the number of corrections used in the L-BFGS update. 10 is 
 recommended.</li>
   <li><code>maxNumIterations</code> is the maximal number of iterations that L-BFGS can be run.</li>
@@ -555,14 +555,14 @@ containing weights for every feature, and the second element is an array contain
 the loss computed for every iteration.</p>
 
 <p>Here is an example to train binary logistic regression with L2 regularization using
-L-BFGS optimizer.</p>
+L-BFGS optimizer. </p>
 
 <div class="codetabs">
 
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.optimization.LBFGS"><code>LBFGS</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.optimization.SquaredL2Updater"><code>SquaredL2Updater</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.LogisticRegressionModel</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.classification.LogisticRegressionModel</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.evaluation.BinaryClassificationMetrics</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.optimization.</span><span class="o">{</span><span class="nc">LBFGS</span><span class="o">,</span> <span class="nc">LogisticGradient</span><span class="o">,</span> <span class="nc">SquaredL2Updater</span><span class="o">}</span>
@@ -623,7 +623,7 @@ L-BFGS optimizer.</p>
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/optimization/LBFGS.html"><code>LBFGS</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/optimization/SquaredL2Updater.html"><code>SquaredL2Updater</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
 
@@ -658,15 +658,15 @@ L-BFGS optimizer.</p>
 
 <span class="c1">// Run training algorithm to build the model.</span>
 <span class="kt">int</span> <span class="n">numCorrections</span> <span class="o">=</span> <span class="mi">10</span><span class="o">;</span>
-<span class="kt">double</span> <span class="n">convergenceTol</span> <span class="o">=</span> <span class="mi">1</span><span class="n">e</span><span class="o">-</span><span class="mi">4</span><span class="o">;</span>
+<span class="kt">double</span> <span class="n">convergenceTol</span> <span class="o">=</span> <span class="mf">1e-4</span><span class="o">;</span>
 <span class="kt">int</span> <span class="n">maxNumIterations</span> <span class="o">=</span> <span class="mi">20</span><span class="o">;</span>
 <span class="kt">double</span> <span class="n">regParam</span> <span class="o">=</span> <span class="mf">0.1</span><span class="o">;</span>
 <span class="n">Vector</span> <span class="n">initialWeightsWithIntercept</span> <span class="o">=</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="k">new</span> <span class="kt">double</span><span class="o">[</span><span class="n">numFeatures</span> <span class="o">+</span> <span class="mi">1</span><span class="o">]);</span>
 
 <span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">Vector</span><span class="o">,</span> <span class="kt">double</span><span class="o">[]&gt;</span> <span class="n">result</span> <span class="o">=</span> <span class="n">LBFGS</span><span class="o">.</span><span class="na">runLBFGS</span><span class="o">(</span>
   <span class="n">training</span><span class="o">.</span><span class="na">rdd</span><span class="o">(),</span>
-  <span class="k">new</span> <span class="nf">LogisticGradient</span><span class="o">(),</span>
-  <span class="k">new</span> <span class="nf">SquaredL2Updater</span><span class="o">(),</span>
+  <span class="k">new</span> <span class="n">LogisticGradient</span><span class="o">(),</span>
+  <span class="k">new</span> <span class="n">SquaredL2Updater</span><span class="o">(),</span>
   <span class="n">numCorrections</span><span class="o">,</span>
   <span class="n">convergenceTol</span><span class="o">,</span>
   <span class="n">maxNumIterations</span><span class="o">,</span>
@@ -675,7 +675,7 @@ L-BFGS optimizer.</p>
 <span class="n">Vector</span> <span class="n">weightsWithIntercept</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="na">_1</span><span class="o">();</span>
 <span class="kt">double</span><span class="o">[]</span> <span class="n">loss</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="na">_2</span><span class="o">();</span>
 
-<span class="kd">final</span> <span class="n">LogisticRegressionModel</span> <span class="n">model</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LogisticRegressionModel</span><span class="o">(</span>
+<span class="kd">final</span> <span class="n">LogisticRegressionModel</span> <span class="n">model</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LogisticRegressionModel</span><span class="o">(</span>
   <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="n">Arrays</span><span class="o">.</span><span class="na">copyOf</span><span class="o">(</span><span class="n">weightsWithIntercept</span><span class="o">.</span><span class="na">toArray</span><span class="o">(),</span> <span class="n">weightsWithIntercept</span><span class="o">.</span><span class="na">size</span><span class="o">()</span> <span class="o">-</span> <span class="mi">1</span><span class="o">)),</span>
   <span class="o">(</span><span class="n">weightsWithIntercept</span><span class="o">.</span><span class="na">toArray</span><span class="o">())[</span><span class="n">weightsWithIntercept</span><span class="o">.</span><span class="na">size</span><span class="o">()</span> <span class="o">-</span> <span class="mi">1</span><span class="o">]);</span>
 
@@ -693,7 +693,7 @@ L-BFGS optimizer.</p>
 
 <span class="c1">// Get evaluation metrics.</span>
 <span class="n">BinaryClassificationMetrics</span> <span class="n">metrics</span> <span class="o">=</span>
-  <span class="k">new</span> <span class="nf">BinaryClassificationMetrics</span><span class="o">(</span><span class="n">scoreAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
+  <span class="k">new</span> <span class="n">BinaryClassificationMetrics</span><span class="o">(</span><span class="n">scoreAndLabels</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 <span class="kt">double</span> <span class="n">auROC</span> <span class="o">=</span> <span class="n">metrics</span><span class="o">.</span><span class="na">areaUnderROC</span><span class="o">();</span>
 
 <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;Loss of each step in training process&quot;</span><span class="o">);</span>
@@ -717,7 +717,7 @@ the actual gradient descent step. However, we&#8217;re able to take the gradient
 loss of objective function of regularization for L-BFGS by ignoring the part of logic
 only for gradient decent such as adaptive step size stuff. We will refactorize
 this into regularizer to replace updater to separate the logic between 
-regularization and step update later.</p>
+regularization and step update later. </p>
 
 
                 </div>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[16/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-collaborative-filtering.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-collaborative-filtering.html b/site/docs/2.1.0/mllib-collaborative-filtering.html
index e453032..b3f9e08 100644
--- a/site/docs/2.1.0/mllib-collaborative-filtering.html
+++ b/site/docs/2.1.0/mllib-collaborative-filtering.html
@@ -322,13 +322,13 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#collaborative-filtering" id="markdown-toc-collaborative-filtering">Collaborative filtering</a>    <ul>
-      <li><a href="#explicit-vs-implicit-feedback" id="markdown-toc-explicit-vs-implicit-feedback">Explicit vs. implicit feedback</a></li>
-      <li><a href="#scaling-of-the-regularization-parameter" id="markdown-toc-scaling-of-the-regularization-parameter">Scaling of the regularization parameter</a></li>
+  <li><a href="#collaborative-filtering">Collaborative filtering</a>    <ul>
+      <li><a href="#explicit-vs-implicit-feedback">Explicit vs. implicit feedback</a></li>
+      <li><a href="#scaling-of-the-regularization-parameter">Scaling of the regularization parameter</a></li>
     </ul>
   </li>
-  <li><a href="#examples" id="markdown-toc-examples">Examples</a></li>
-  <li><a href="#tutorial" id="markdown-toc-tutorial">Tutorial</a></li>
+  <li><a href="#examples">Examples</a></li>
+  <li><a href="#tutorial">Tutorial</a></li>
 </ul>
 
 <h2 id="collaborative-filtering">Collaborative filtering</h2>
@@ -393,7 +393,7 @@ recommendation model by measuring the Mean Squared Error of rating prediction.</
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.recommendation.ALS"><code>ALS</code> Scala docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.recommendation.ALS</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.recommendation.ALS</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.recommendation.MatrixFactorizationModel</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.recommendation.Rating</span>
 
@@ -434,9 +434,9 @@ recommendation model by measuring the Mean Squared Error of rating prediction.</
     <p>If the rating matrix is derived from another source of information (i.e. it is inferred from
 other signals), you can use the <code>trainImplicit</code> method to get better results.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">alpha</span> <span class="k">=</span> <span class="mf">0.01</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">alpha</span> <span class="k">=</span> <span class="mf">0.01</span>
 <span class="k">val</span> <span class="n">lambda</span> <span class="k">=</span> <span class="mf">0.01</span>
-<span class="k">val</span> <span class="n">model</span> <span class="k">=</span> <span class="nc">ALS</span><span class="o">.</span><span class="n">trainImplicit</span><span class="o">(</span><span class="n">ratings</span><span class="o">,</span> <span class="n">rank</span><span class="o">,</span> <span class="n">numIterations</span><span class="o">,</span> <span class="n">lambda</span><span class="o">,</span> <span class="n">alpha</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">model</span> <span class="k">=</span> <span class="nc">ALS</span><span class="o">.</span><span class="n">trainImplicit</span><span class="o">(</span><span class="n">ratings</span><span class="o">,</span> <span class="n">rank</span><span class="o">,</span> <span class="n">numIterations</span><span class="o">,</span> <span class="n">lambda</span><span class="o">,</span> <span class="n">alpha</span><span class="o">)</span></code></pre></figure>
 
   </div>
 
@@ -449,7 +449,7 @@ that is equivalent to the provided example in Scala is given below:</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/recommendation/ALS.html"><code>ALS</code> Java docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.*</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
@@ -458,8 +458,8 @@ that is equivalent to the provided example in Scala is given below:</p>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.recommendation.Rating</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.SparkConf</span><span class="o">;</span>
 
-<span class="n">SparkConf</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;Java Collaborative Filtering Example&quot;</span><span class="o">);</span>
-<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JavaSparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">);</span>
+<span class="n">SparkConf</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;Java Collaborative Filtering Example&quot;</span><span class="o">);</span>
+<span class="n">JavaSparkContext</span> <span class="n">jsc</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaSparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">);</span>
 
 <span class="c1">// Load and parse the data</span>
 <span class="n">String</span> <span class="n">path</span> <span class="o">=</span> <span class="s">&quot;data/mllib/als/test.data&quot;</span><span class="o">;</span>
@@ -468,7 +468,7 @@ that is equivalent to the provided example in Scala is given below:</p>
   <span class="k">new</span> <span class="n">Function</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Rating</span><span class="o">&gt;()</span> <span class="o">{</span>
     <span class="kd">public</span> <span class="n">Rating</span> <span class="nf">call</span><span class="o">(</span><span class="n">String</span> <span class="n">s</span><span class="o">)</span> <span class="o">{</span>
       <span class="n">String</span><span class="o">[]</span> <span class="n">sarray</span> <span class="o">=</span> <span class="n">s</span><span class="o">.</span><span class="na">split</span><span class="o">(</span><span class="s">&quot;,&quot;</span><span class="o">);</span>
-      <span class="k">return</span> <span class="k">new</span> <span class="nf">Rating</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">sarray</span><span class="o">[</span><span class="mi">0</span><span class="o">]),</span> <span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">sarray</span><span class="o">[</span><span class="mi">1</span><span class="o">]),</span>
+      <span class="k">return</span> <span class="k">new</span> <span class="n">Rating</span><span class="o">(</span><span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">sarray</span><span class="o">[</span><span class="mi">0</span><span class="o">]),</span> <span class="n">Integer</span><span class="o">.</span><span class="na">parseInt</span><span class="o">(</span><span class="n">sarray</span><span class="o">[</span><span class="mi">1</span><span class="o">]),</span>
         <span class="n">Double</span><span class="o">.</span><span class="na">parseDouble</span><span class="o">(</span><span class="n">sarray</span><span class="o">[</span><span class="mi">2</span><span class="o">]));</span>
     <span class="o">}</span>
   <span class="o">}</span>
@@ -528,36 +528,36 @@ recommendation by measuring the Mean Squared Error of rating prediction.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.recommendation.ALS"><code>ALS</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.recommendation</span> <span class="kn">import</span> <span class="n">ALS</span><span class="p">,</span> <span class="n">MatrixFactorizationModel</span><span class="p">,</span> <span class="n">Rating</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.recommendation</span> <span class="kn">import</span> <span class="n">ALS</span><span class="p">,</span> <span class="n">MatrixFactorizationModel</span><span class="p">,</span> <span class="n">Rating</span>
 
-<span class="c"># Load and parse the data</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/als/test.data&quot;</span><span class="p">)</span>
-<span class="n">ratings</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">l</span><span class="p">:</span> <span class="n">l</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39;,&#39;</span><span class="p">))</span>\
+<span class="c1"># Load and parse the data</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/als/test.data&quot;</span><span class="p">)</span>
+<span class="n">ratings</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">l</span><span class="p">:</span> <span class="n">l</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;,&#39;</span><span class="p">))</span>\
     <span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">l</span><span class="p">:</span> <span class="n">Rating</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">l</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="nb">int</span><span class="p">(</span><span class="n">l</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="nb">float</span><span class="p">(</span><span class="n">l</span><span class="p">[</span><span class="mi">2</span><span class="p">])))</span>
 
-<span class="c"># Build the recommendation model using Alternating Least Squares</span>
+<span class="c1"># Build the recommendation model using Alternating Least Squares</span>
 <span class="n">rank</span> <span class="o">=</span> <span class="mi">10</span>
 <span class="n">numIterations</span> <span class="o">=</span> <span class="mi">10</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">ALS</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">ratings</span><span class="p">,</span> <span class="n">rank</span><span class="p">,</span> <span class="n">numIterations</span><span class="p">)</span>
 
-<span class="c"># Evaluate the model on training data</span>
+<span class="c1"># Evaluate the model on training data</span>
 <span class="n">testdata</span> <span class="o">=</span> <span class="n">ratings</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="p">(</span><span class="n">p</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">p</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
 <span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predictAll</span><span class="p">(</span><span class="n">testdata</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">r</span><span class="p">:</span> <span class="p">((</span><span class="n">r</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">r</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">r</span><span class="p">[</span><span class="mi">2</span><span class="p">]))</span>
 <span class="n">ratesAndPreds</span> <span class="o">=</span> <span class="n">ratings</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">r</span><span class="p">:</span> <span class="p">((</span><span class="n">r</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">r</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">r</span><span class="p">[</span><span class="mi">2</span><span class="p">]))</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">predictions</span><span class="p">)</span>
 <span class="n">MSE</span> <span class="o">=</span> <span class="n">ratesAndPreds</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">r</span><span class="p">:</span> <span class="p">(</span><span class="n">r</span><span class="p">[</span><span class="mi">1</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">r</span><span class="p">[</span><span class="mi">1</span><span class="p">][</span><span class="mi">1</span><span class="p">])</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Mean Squared Error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">MSE</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Mean Squared Error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">MSE</span><span class="p">))</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/myCollaborativeFilter&quot;</span><span class="p">)</span>
-<span class="n">sameModel</span> <span class="o">=</span> <span class="n">MatrixFactorizationModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/myCollaborativeFilter&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/myCollaborativeFilter&quot;</span><span class="p">)</span>
+<span class="n">sameModel</span> <span class="o">=</span> <span class="n">MatrixFactorizationModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/myCollaborativeFilter&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/recommendation_example.py" in the Spark repo.</small></div>
 
     <p>If the rating matrix is derived from other source of information (i.e. it is inferred from other
 signals), you can use the trainImplicit method to get better results.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># Build the recommendation model using Alternating Least Squares based on implicit ratings</span>
-<span class="n">model</span> <span class="o">=</span> <span class="n">ALS</span><span class="o">.</span><span class="n">trainImplicit</span><span class="p">(</span><span class="n">ratings</span><span class="p">,</span> <span class="n">rank</span><span class="p">,</span> <span class="n">numIterations</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.01</span><span class="p">)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="c1"># Build the recommendation model using Alternating Least Squares based on implicit ratings</span>
+<span class="n">model</span> <span class="o">=</span> <span class="n">ALS</span><span class="o">.</span><span class="n">trainImplicit</span><span class="p">(</span><span class="n">ratings</span><span class="p">,</span> <span class="n">rank</span><span class="p">,</span> <span class="n">numIterations</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.01</span><span class="p">)</span></code></pre></figure>
 
   </div>
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[17/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-clustering.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-clustering.html b/site/docs/2.1.0/mllib-clustering.html
index 9667606..1b50dab 100644
--- a/site/docs/2.1.0/mllib-clustering.html
+++ b/site/docs/2.1.0/mllib-clustering.html
@@ -366,12 +366,12 @@ models are trained for each cluster).</p>
 <p>The <code>spark.mllib</code> package supports the following models:</p>
 
 <ul id="markdown-toc">
-  <li><a href="#k-means" id="markdown-toc-k-means">K-means</a></li>
-  <li><a href="#gaussian-mixture" id="markdown-toc-gaussian-mixture">Gaussian mixture</a></li>
-  <li><a href="#power-iteration-clustering-pic" id="markdown-toc-power-iteration-clustering-pic">Power iteration clustering (PIC)</a></li>
-  <li><a href="#latent-dirichlet-allocation-lda" id="markdown-toc-latent-dirichlet-allocation-lda">Latent Dirichlet allocation (LDA)</a></li>
-  <li><a href="#bisecting-k-means" id="markdown-toc-bisecting-k-means">Bisecting k-means</a></li>
-  <li><a href="#streaming-k-means" id="markdown-toc-streaming-k-means">Streaming k-means</a></li>
+  <li><a href="#k-means">K-means</a></li>
+  <li><a href="#gaussian-mixture">Gaussian mixture</a></li>
+  <li><a href="#power-iteration-clustering-pic">Power iteration clustering (PIC)</a></li>
+  <li><a href="#latent-dirichlet-allocation-lda">Latent Dirichlet allocation (LDA)</a></li>
+  <li><a href="#bisecting-k-means">Bisecting k-means</a></li>
+  <li><a href="#streaming-k-means">Streaming k-means</a></li>
 </ul>
 
 <h2 id="k-means">K-means</h2>
@@ -408,7 +408,7 @@ optimal <em>k</em> is usually one where there is an &#8220;elbow&#8221; in the W
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.clustering.KMeans"><code>KMeans</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.clustering.KMeansModel"><code>KMeansModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.</span><span class="o">{</span><span class="nc">KMeans</span><span class="o">,</span> <span class="nc">KMeansModel</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.</span><span class="o">{</span><span class="nc">KMeans</span><span class="o">,</span> <span class="nc">KMeansModel</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 
 <span class="c1">// Load and parse the data</span>
@@ -440,7 +440,7 @@ that is equivalent to the provided example in Scala is given below:</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/clustering/KMeans.html"><code>KMeans</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/clustering/KMeansModel.html"><code>KMeansModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.KMeans</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.KMeansModel</span><span class="o">;</span>
@@ -470,7 +470,7 @@ that is equivalent to the provided example in Scala is given below:</p>
 <span class="n">KMeansModel</span> <span class="n">clusters</span> <span class="o">=</span> <span class="n">KMeans</span><span class="o">.</span><span class="na">train</span><span class="o">(</span><span class="n">parsedData</span><span class="o">.</span><span class="na">rdd</span><span class="o">(),</span> <span class="n">numClusters</span><span class="o">,</span> <span class="n">numIterations</span><span class="o">);</span>
 
 <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;Cluster centers:&quot;</span><span class="o">);</span>
-<span class="k">for</span> <span class="o">(</span><span class="n">Vector</span> <span class="nl">center:</span> <span class="n">clusters</span><span class="o">.</span><span class="na">clusterCenters</span><span class="o">())</span> <span class="o">{</span>
+<span class="k">for</span> <span class="o">(</span><span class="n">Vector</span> <span class="n">center</span><span class="o">:</span> <span class="n">clusters</span><span class="o">.</span><span class="na">clusterCenters</span><span class="o">())</span> <span class="o">{</span>
   <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot; &quot;</span> <span class="o">+</span> <span class="n">center</span><span class="o">);</span>
 <span class="o">}</span>
 <span class="kt">double</span> <span class="n">cost</span> <span class="o">=</span> <span class="n">clusters</span><span class="o">.</span><span class="na">computeCost</span><span class="o">(</span><span class="n">parsedData</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
@@ -498,29 +498,29 @@ fact the optimal <em>k</em> is usually one where there is an &#8220;elbow&#8221;
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.KMeans"><code>KMeans</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.KMeansModel"><code>KMeansModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">numpy</span> <span class="kn">import</span> <span class="n">array</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">numpy</span> <span class="kn">import</span> <span class="n">array</span>
 <span class="kn">from</span> <span class="nn">math</span> <span class="kn">import</span> <span class="n">sqrt</span>
 
 <span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">KMeans</span><span class="p">,</span> <span class="n">KMeansModel</span>
 
-<span class="c"># Load and parse the data</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span>
-<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">array</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)]))</span>
+<span class="c1"># Load and parse the data</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span>
+<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">array</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)]))</span>
 
-<span class="c"># Build the model (cluster the data)</span>
-<span class="n">clusters</span> <span class="o">=</span> <span class="n">KMeans</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">parsedData</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">maxIterations</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">initializationMode</span><span class="o">=</span><span class="s">&quot;random&quot;</span><span class="p">)</span>
+<span class="c1"># Build the model (cluster the data)</span>
+<span class="n">clusters</span> <span class="o">=</span> <span class="n">KMeans</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">parsedData</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">maxIterations</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">initializationMode</span><span class="o">=</span><span class="s2">&quot;random&quot;</span><span class="p">)</span>
 
-<span class="c"># Evaluate clustering by computing Within Set Sum of Squared Errors</span>
+<span class="c1"># Evaluate clustering by computing Within Set Sum of Squared Errors</span>
 <span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="n">point</span><span class="p">):</span>
     <span class="n">center</span> <span class="o">=</span> <span class="n">clusters</span><span class="o">.</span><span class="n">centers</span><span class="p">[</span><span class="n">clusters</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">point</span><span class="p">)]</span>
     <span class="k">return</span> <span class="n">sqrt</span><span class="p">(</span><span class="nb">sum</span><span class="p">([</span><span class="n">x</span><span class="o">**</span><span class="mi">2</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="p">(</span><span class="n">point</span> <span class="o">-</span> <span class="n">center</span><span class="p">)]))</span>
 
 <span class="n">WSSSE</span> <span class="o">=</span> <span class="n">parsedData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">point</span><span class="p">:</span> <span class="n">error</span><span class="p">(</span><span class="n">point</span><span class="p">))</span><span class="o">.</span><span class="n">reduce</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Within Set Sum of Squared Error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">WSSSE</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Within Set Sum of Squared Error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">WSSSE</span><span class="p">))</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">clusters</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonKMeansExample/KMeansModel&quot;</span><span class="p">)</span>
-<span class="n">sameModel</span> <span class="o">=</span> <span class="n">KMeansModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonKMeansExample/KMeansModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">clusters</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonKMeansExample/KMeansModel&quot;</span><span class="p">)</span>
+<span class="n">sameModel</span> <span class="o">=</span> <span class="n">KMeansModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonKMeansExample/KMeansModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/k_means_example.py" in the Spark repo.</small></div>
   </div>
@@ -554,7 +554,7 @@ to the algorithm. We then output the parameters of the mixture model.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.clustering.GaussianMixture"><code>GaussianMixture</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.clustering.GaussianMixtureModel"><code>GaussianMixtureModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.</span><span class="o">{</span><span class="nc">GaussianMixture</span><span class="o">,</span> <span class="nc">GaussianMixtureModel</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.</span><span class="o">{</span><span class="nc">GaussianMixture</span><span class="o">,</span> <span class="nc">GaussianMixtureModel</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 
 <span class="c1">// Load and parse the data</span>
@@ -587,7 +587,7 @@ that is equivalent to the provided example in Scala is given below:</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/clustering/GaussianMixture.html"><code>GaussianMixture</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/clustering/GaussianMixtureModel.html"><code>GaussianMixtureModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.GaussianMixture</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.GaussianMixtureModel</span><span class="o">;</span>
@@ -612,7 +612,7 @@ that is equivalent to the provided example in Scala is given below:</p>
 <span class="n">parsedData</span><span class="o">.</span><span class="na">cache</span><span class="o">();</span>
 
 <span class="c1">// Cluster the data into two classes using GaussianMixture</span>
-<span class="n">GaussianMixtureModel</span> <span class="n">gmm</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">GaussianMixture</span><span class="o">().</span><span class="na">setK</span><span class="o">(</span><span class="mi">2</span><span class="o">).</span><span class="na">run</span><span class="o">(</span><span class="n">parsedData</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
+<span class="n">GaussianMixtureModel</span> <span class="n">gmm</span> <span class="o">=</span> <span class="k">new</span> <span class="n">GaussianMixture</span><span class="o">().</span><span class="na">setK</span><span class="o">(</span><span class="mi">2</span><span class="o">).</span><span class="na">run</span><span class="o">(</span><span class="n">parsedData</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 
 <span class="c1">// Save and load GaussianMixtureModel</span>
 <span class="n">gmm</span><span class="o">.</span><span class="na">save</span><span class="o">(</span><span class="n">jsc</span><span class="o">.</span><span class="na">sc</span><span class="o">(),</span> <span class="s">&quot;target/org/apache/spark/JavaGaussianMixtureExample/GaussianMixtureModel&quot;</span><span class="o">);</span>
@@ -636,26 +636,26 @@ to the algorithm. We then output the parameters of the mixture model.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.GaussianMixture"><code>GaussianMixture</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.GaussianMixtureModel"><code>GaussianMixtureModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">numpy</span> <span class="kn">import</span> <span class="n">array</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">numpy</span> <span class="kn">import</span> <span class="n">array</span>
 
 <span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">GaussianMixture</span><span class="p">,</span> <span class="n">GaussianMixtureModel</span>
 
-<span class="c"># Load and parse the data</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/gmm_data.txt&quot;</span><span class="p">)</span>
-<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">array</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)]))</span>
+<span class="c1"># Load and parse the data</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/gmm_data.txt&quot;</span><span class="p">)</span>
+<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">array</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)]))</span>
 
-<span class="c"># Build the model (cluster the data)</span>
+<span class="c1"># Build the model (cluster the data)</span>
 <span class="n">gmm</span> <span class="o">=</span> <span class="n">GaussianMixture</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">parsedData</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">gmm</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonGaussianMixtureExample/GaussianMixtureModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">gmm</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonGaussianMixtureExample/GaussianMixtureModel&quot;</span><span class="p">)</span>
 <span class="n">sameModel</span> <span class="o">=</span> <span class="n">GaussianMixtureModel</span>\
-    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonGaussianMixtureExample/GaussianMixtureModel&quot;</span><span class="p">)</span>
+    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonGaussianMixtureExample/GaussianMixtureModel&quot;</span><span class="p">)</span>
 
-<span class="c"># output parameters of model</span>
+<span class="c1"># output parameters of model</span>
 <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">):</span>
-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;weight = &quot;</span><span class="p">,</span> <span class="n">gmm</span><span class="o">.</span><span class="n">weights</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="s">&quot;mu = &quot;</span><span class="p">,</span> <span class="n">gmm</span><span class="o">.</span><span class="n">gaussians</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">mu</span><span class="p">,</span>
-          <span class="s">&quot;sigma = &quot;</span><span class="p">,</span> <span class="n">gmm</span><span class="o">.</span><span class="n">gaussians</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">sigma</span><span class="o">.</span><span class="n">toArray</span><span class="p">())</span>
+    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;weight = &quot;</span><span class="p">,</span> <span class="n">gmm</span><span class="o">.</span><span class="n">weights</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="s2">&quot;mu = &quot;</span><span class="p">,</span> <span class="n">gmm</span><span class="o">.</span><span class="n">gaussians</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">mu</span><span class="p">,</span>
+          <span class="s2">&quot;sigma = &quot;</span><span class="p">,</span> <span class="n">gmm</span><span class="o">.</span><span class="n">gaussians</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">sigma</span><span class="o">.</span><span class="n">toArray</span><span class="p">())</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/gaussian_mixture_example.py" in the Spark repo.</small></div>
   </div>
@@ -701,7 +701,7 @@ which contains the computed clustering assignments.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClustering"><code>PowerIterationClustering</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClusteringModel"><code>PowerIterationClusteringModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.PowerIterationClustering</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.PowerIterationClustering</span>
 
 <span class="k">val</span> <span class="n">circlesRdd</span> <span class="k">=</span> <span class="n">generateCirclesRdd</span><span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="n">params</span><span class="o">.</span><span class="n">k</span><span class="o">,</span> <span class="n">params</span><span class="o">.</span><span class="n">numPoints</span><span class="o">)</span>
 <span class="k">val</span> <span class="n">model</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">PowerIterationClustering</span><span class="o">()</span>
@@ -714,12 +714,12 @@ which contains the computed clustering assignments.</p>
 <span class="k">val</span> <span class="n">assignments</span> <span class="k">=</span> <span class="n">clusters</span><span class="o">.</span><span class="n">toList</span><span class="o">.</span><span class="n">sortBy</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">k</span><span class="o">,</span> <span class="n">v</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">v</span><span class="o">.</span><span class="n">length</span> <span class="o">}</span>
 <span class="k">val</span> <span class="n">assignmentsStr</span> <span class="k">=</span> <span class="n">assignments</span>
   <span class="o">.</span><span class="n">map</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">k</span><span class="o">,</span> <span class="n">v</span><span class="o">)</span> <span class="k">=&gt;</span>
-    <span class="n">s</span><span class="s">&quot;$k -&gt; ${v.sorted.mkString(&quot;</span><span class="o">[</span><span class="err">&quot;</span>, <span class="err">&quot;</span>,<span class="err">&quot;</span>, <span class="err">&quot;</span><span class="o">]</span><span class="s">&quot;)}&quot;</span>
+    <span class="s">s&quot;</span><span class="si">$k</span><span class="s"> -&gt; </span><span class="si">${</span><span class="n">v</span><span class="o">.</span><span class="n">sorted</span><span class="o">.</span><span class="n">mkString</span><span class="o">(</span><span class="s">&quot;[&quot;</span><span class="o">,</span> <span class="s">&quot;,&quot;</span><span class="o">,</span> <span class="s">&quot;]&quot;</span><span class="o">)</span><span class="si">}</span><span class="s">&quot;</span>
   <span class="o">}.</span><span class="n">mkString</span><span class="o">(</span><span class="s">&quot;, &quot;</span><span class="o">)</span>
 <span class="k">val</span> <span class="n">sizesStr</span> <span class="k">=</span> <span class="n">assignments</span><span class="o">.</span><span class="n">map</span> <span class="o">{</span>
   <span class="k">_</span><span class="o">.</span><span class="n">_2</span><span class="o">.</span><span class="n">length</span>
 <span class="o">}.</span><span class="n">sorted</span><span class="o">.</span><span class="n">mkString</span><span class="o">(</span><span class="s">&quot;(&quot;</span><span class="o">,</span> <span class="s">&quot;,&quot;</span><span class="o">,</span> <span class="s">&quot;)&quot;</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Cluster assignments: $assignmentsStr\ncluster sizes: $sizesStr&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Cluster assignments: </span><span class="si">$assignmentsStr</span><span class="s">\ncluster sizes: </span><span class="si">$sizesStr</span><span class="s">&quot;</span><span class="o">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala" in the Spark repo.</small></div>
   </div>
@@ -736,22 +736,22 @@ which contains the computed clustering assignments.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/clustering/PowerIterationClustering.html"><code>PowerIterationClustering</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/clustering/PowerIterationClusteringModel.html"><code>PowerIterationClusteringModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.PowerIterationClustering</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.PowerIterationClustering</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.PowerIterationClusteringModel</span><span class="o">;</span>
 
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Tuple3</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">,</span> <span class="n">Long</span><span class="o">,</span> <span class="n">Double</span><span class="o">&gt;&gt;</span> <span class="n">similarities</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">parallelize</span><span class="o">(</span><span class="n">Lists</span><span class="o">.</span><span class="na">newArrayList</span><span class="o">(</span>
-  <span class="k">new</span> <span class="n">Tuple3</span><span class="o">&lt;&gt;(</span><span class="mi">0L</span><span class="o">,</span> <span class="mi">1L</span><span class="o">,</span> <span class="mf">0.9</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">Tuple3</span><span class="o">&lt;&gt;(</span><span class="mi">0</span><span class="n">L</span><span class="o">,</span> <span class="mi">1L</span><span class="o">,</span> <span class="mf">0.9</span><span class="o">),</span>
   <span class="k">new</span> <span class="n">Tuple3</span><span class="o">&lt;&gt;(</span><span class="mi">1L</span><span class="o">,</span> <span class="mi">2L</span><span class="o">,</span> <span class="mf">0.9</span><span class="o">),</span>
   <span class="k">new</span> <span class="n">Tuple3</span><span class="o">&lt;&gt;(</span><span class="mi">2L</span><span class="o">,</span> <span class="mi">3L</span><span class="o">,</span> <span class="mf">0.9</span><span class="o">),</span>
   <span class="k">new</span> <span class="n">Tuple3</span><span class="o">&lt;&gt;(</span><span class="mi">3L</span><span class="o">,</span> <span class="mi">4L</span><span class="o">,</span> <span class="mf">0.1</span><span class="o">),</span>
   <span class="k">new</span> <span class="n">Tuple3</span><span class="o">&lt;&gt;(</span><span class="mi">4L</span><span class="o">,</span> <span class="mi">5L</span><span class="o">,</span> <span class="mf">0.9</span><span class="o">)));</span>
 
-<span class="n">PowerIterationClustering</span> <span class="n">pic</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">PowerIterationClustering</span><span class="o">()</span>
+<span class="n">PowerIterationClustering</span> <span class="n">pic</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PowerIterationClustering</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setK</span><span class="o">(</span><span class="mi">2</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setMaxIterations</span><span class="o">(</span><span class="mi">10</span><span class="o">);</span>
 <span class="n">PowerIterationClusteringModel</span> <span class="n">model</span> <span class="o">=</span> <span class="n">pic</span><span class="o">.</span><span class="na">run</span><span class="o">(</span><span class="n">similarities</span><span class="o">);</span>
 
-<span class="k">for</span> <span class="o">(</span><span class="n">PowerIterationClustering</span><span class="o">.</span><span class="na">Assignment</span> <span class="nl">a:</span> <span class="n">model</span><span class="o">.</span><span class="na">assignments</span><span class="o">().</span><span class="na">toJavaRDD</span><span class="o">().</span><span class="na">collect</span><span class="o">())</span> <span class="o">{</span>
+<span class="k">for</span> <span class="o">(</span><span class="n">PowerIterationClustering</span><span class="o">.</span><span class="na">Assignment</span> <span class="n">a</span><span class="o">:</span> <span class="n">model</span><span class="o">.</span><span class="na">assignments</span><span class="o">().</span><span class="na">toJavaRDD</span><span class="o">().</span><span class="na">collect</span><span class="o">())</span> <span class="o">{</span>
   <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">a</span><span class="o">.</span><span class="na">id</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot; -&gt; &quot;</span> <span class="o">+</span> <span class="n">a</span><span class="o">.</span><span class="na">cluster</span><span class="o">());</span>
 <span class="o">}</span>
 </pre></div>
@@ -770,21 +770,21 @@ which contains the computed clustering assignments.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.PowerIterationClustering"><code>PowerIterationClustering</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.PowerIterationClusteringModel"><code>PowerIterationClusteringModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">PowerIterationClustering</span><span class="p">,</span> <span class="n">PowerIterationClusteringModel</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">PowerIterationClustering</span><span class="p">,</span> <span class="n">PowerIterationClusteringModel</span>
 
-<span class="c"># Load and parse the data</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/pic_data.txt&quot;</span><span class="p">)</span>
-<span class="n">similarities</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)]))</span>
+<span class="c1"># Load and parse the data</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/pic_data.txt&quot;</span><span class="p">)</span>
+<span class="n">similarities</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)]))</span>
 
-<span class="c"># Cluster the data into two classes using PowerIterationClustering</span>
+<span class="c1"># Cluster the data into two classes using PowerIterationClustering</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">PowerIterationClustering</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">similarities</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
 
-<span class="n">model</span><span class="o">.</span><span class="n">assignments</span><span class="p">()</span><span class="o">.</span><span class="n">foreach</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="k">print</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">id</span><span class="p">)</span> <span class="o">+</span> <span class="s">&quot; -&gt; &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">cluster</span><span class="p">)))</span>
+<span class="n">model</span><span class="o">.</span><span class="n">assignments</span><span class="p">()</span><span class="o">.</span><span class="n">foreach</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="k">print</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">id</span><span class="p">)</span> <span class="o">+</span> <span class="s2">&quot; -&gt; &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">cluster</span><span class="p">)))</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonPowerIterationClusteringExample/PICModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonPowerIterationClusteringExample/PICModel&quot;</span><span class="p">)</span>
 <span class="n">sameModel</span> <span class="o">=</span> <span class="n">PowerIterationClusteringModel</span>\
-    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonPowerIterationClusteringExample/PICModel&quot;</span><span class="p">)</span>
+    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonPowerIterationClusteringExample/PICModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/power_iteration_clustering_example.py" in the Spark repo.</small></div>
   </div>
@@ -947,7 +947,7 @@ to the algorithm. We then output the topics, represented as probability distribu
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.clustering.LDA"><code>LDA</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.clustering.DistributedLDAModel"><code>DistributedLDAModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.</span><span class="o">{</span><span class="nc">DistributedLDAModel</span><span class="o">,</span> <span class="nc">LDA</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.</span><span class="o">{</span><span class="nc">DistributedLDAModel</span><span class="o">,</span> <span class="nc">LDA</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 
 <span class="c1">// Load and parse the data</span>
@@ -979,7 +979,7 @@ to the algorithm. We then output the topics, represented as probability distribu
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/clustering/LDA.html"><code>LDA</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/clustering/DistributedLDAModel.html"><code>DistributedLDAModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaPairRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
@@ -1019,7 +1019,7 @@ to the algorithm. We then output the topics, represented as probability distribu
 <span class="n">corpus</span><span class="o">.</span><span class="na">cache</span><span class="o">();</span>
 
 <span class="c1">// Cluster the documents into three topics using LDA</span>
-<span class="n">LDAModel</span> <span class="n">ldaModel</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LDA</span><span class="o">().</span><span class="na">setK</span><span class="o">(</span><span class="mi">3</span><span class="o">).</span><span class="na">run</span><span class="o">(</span><span class="n">corpus</span><span class="o">);</span>
+<span class="n">LDAModel</span> <span class="n">ldaModel</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LDA</span><span class="o">().</span><span class="na">setK</span><span class="o">(</span><span class="mi">3</span><span class="o">).</span><span class="na">run</span><span class="o">(</span><span class="n">corpus</span><span class="o">);</span>
 
 <span class="c1">// Output topics. Each is a distribution over words (matching word count vectors)</span>
 <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;Learned topics (as distributions over vocab of &quot;</span> <span class="o">+</span> <span class="n">ldaModel</span><span class="o">.</span><span class="na">vocabSize</span><span class="o">()</span>
@@ -1044,31 +1044,31 @@ to the algorithm. We then output the topics, represented as probability distribu
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.LDA"><code>LDA</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.LDAModel"><code>LDAModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">LDA</span><span class="p">,</span> <span class="n">LDAModel</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">LDA</span><span class="p">,</span> <span class="n">LDAModel</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Vectors</span>
 
-<span class="c"># Load and parse the data</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_lda_data.txt&quot;</span><span class="p">)</span>
-<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)]))</span>
-<span class="c"># Index documents with unique IDs</span>
+<span class="c1"># Load and parse the data</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_lda_data.txt&quot;</span><span class="p">)</span>
+<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)]))</span>
+<span class="c1"># Index documents with unique IDs</span>
 <span class="n">corpus</span> <span class="o">=</span> <span class="n">parsedData</span><span class="o">.</span><span class="n">zipWithIndex</span><span class="p">()</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="p">[</span><span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]])</span><span class="o">.</span><span class="n">cache</span><span class="p">()</span>
 
-<span class="c"># Cluster the documents into three topics using LDA</span>
+<span class="c1"># Cluster the documents into three topics using LDA</span>
 <span class="n">ldaModel</span> <span class="o">=</span> <span class="n">LDA</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">corpus</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
 
-<span class="c"># Output topics. Each is a distribution over words (matching word count vectors)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Learned topics (as distributions over vocab of &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">ldaModel</span><span class="o">.</span><span class="n">vocabSize</span><span class="p">())</span>
-      <span class="o">+</span> <span class="s">&quot; words):&quot;</span><span class="p">)</span>
+<span class="c1"># Output topics. Each is a distribution over words (matching word count vectors)</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Learned topics (as distributions over vocab of &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">ldaModel</span><span class="o">.</span><span class="n">vocabSize</span><span class="p">())</span>
+      <span class="o">+</span> <span class="s2">&quot; words):&quot;</span><span class="p">)</span>
 <span class="n">topics</span> <span class="o">=</span> <span class="n">ldaModel</span><span class="o">.</span><span class="n">topicsMatrix</span><span class="p">()</span>
 <span class="k">for</span> <span class="n">topic</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">):</span>
-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Topic &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">topic</span><span class="p">)</span> <span class="o">+</span> <span class="s">&quot;:&quot;</span><span class="p">)</span>
+    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;Topic &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">topic</span><span class="p">)</span> <span class="o">+</span> <span class="s2">&quot;:&quot;</span><span class="p">)</span>
     <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">ldaModel</span><span class="o">.</span><span class="n">vocabSize</span><span class="p">()):</span>
-        <span class="k">print</span><span class="p">(</span><span class="s">&quot; &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">topics</span><span class="p">[</span><span class="n">word</span><span class="p">][</span><span class="n">topic</span><span class="p">]))</span>
+        <span class="k">print</span><span class="p">(</span><span class="s2">&quot; &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">topics</span><span class="p">[</span><span class="n">word</span><span class="p">][</span><span class="n">topic</span><span class="p">]))</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">ldaModel</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonLatentDirichletAllocationExample/LDAModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">ldaModel</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonLatentDirichletAllocationExample/LDAModel&quot;</span><span class="p">)</span>
 <span class="n">sameModel</span> <span class="o">=</span> <span class="n">LDAModel</span>\
-    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonLatentDirichletAllocationExample/LDAModel&quot;</span><span class="p">)</span>
+    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonLatentDirichletAllocationExample/LDAModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/latent_dirichlet_allocation_example.py" in the Spark repo.</small></div>
   </div>
@@ -1104,7 +1104,7 @@ The implementation in MLlib has the following parameters:</p>
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.clustering.BisectingKMeans"><code>BisectingKMeans</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.clustering.BisectingKMeansModel"><code>BisectingKMeansModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.BisectingKMeans</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.BisectingKMeans</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.</span><span class="o">{</span><span class="nc">Vector</span><span class="o">,</span> <span class="nc">Vectors</span><span class="o">}</span>
 
 <span class="c1">// Loads and parses data</span>
@@ -1116,9 +1116,9 @@ The implementation in MLlib has the following parameters:</p>
 <span class="k">val</span> <span class="n">model</span> <span class="k">=</span> <span class="n">bkm</span><span class="o">.</span><span class="n">run</span><span class="o">(</span><span class="n">data</span><span class="o">)</span>
 
 <span class="c1">// Show the compute cost and the cluster centers</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Compute Cost: ${model.computeCost(data)}&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Compute Cost: </span><span class="si">${</span><span class="n">model</span><span class="o">.</span><span class="n">computeCost</span><span class="o">(</span><span class="n">data</span><span class="o">)</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 <span class="n">model</span><span class="o">.</span><span class="n">clusterCenters</span><span class="o">.</span><span class="n">zipWithIndex</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">center</span><span class="o">,</span> <span class="n">idx</span><span class="o">)</span> <span class="k">=&gt;</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Cluster Center ${idx}: ${center}&quot;</span><span class="o">)</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;Cluster Center </span><span class="si">${</span><span class="n">idx</span><span class="si">}</span><span class="s">: </span><span class="si">${</span><span class="n">center</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 <span class="o">}</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/BisectingKMeansExample.scala" in the Spark repo.</small></div>
@@ -1127,7 +1127,7 @@ The implementation in MLlib has the following parameters:</p>
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/clustering/BisectingKMeans.html"><code>BisectingKMeans</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/clustering/BisectingKMeansModel.html"><code>BisectingKMeansModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">com.google.common.collect.Lists</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">com.google.common.collect.Lists</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.BisectingKMeans</span><span class="o">;</span>
@@ -1143,7 +1143,7 @@ The implementation in MLlib has the following parameters:</p>
 <span class="o">);</span>
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Vector</span><span class="o">&gt;</span> <span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">parallelize</span><span class="o">(</span><span class="n">localData</span><span class="o">,</span> <span class="mi">2</span><span class="o">);</span>
 
-<span class="n">BisectingKMeans</span> <span class="n">bkm</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">BisectingKMeans</span><span class="o">()</span>
+<span class="n">BisectingKMeans</span> <span class="n">bkm</span> <span class="o">=</span> <span class="k">new</span> <span class="n">BisectingKMeans</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setK</span><span class="o">(</span><span class="mi">4</span><span class="o">);</span>
 <span class="n">BisectingKMeansModel</span> <span class="n">model</span> <span class="o">=</span> <span class="n">bkm</span><span class="o">.</span><span class="na">run</span><span class="o">(</span><span class="n">data</span><span class="o">);</span>
 
@@ -1161,23 +1161,23 @@ The implementation in MLlib has the following parameters:</p>
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.BisectingKMeans"><code>BisectingKMeans</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.BisectingKMeansModel"><code>BisectingKMeansModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">numpy</span> <span class="kn">import</span> <span class="n">array</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">numpy</span> <span class="kn">import</span> <span class="n">array</span>
 
 <span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">BisectingKMeans</span><span class="p">,</span> <span class="n">BisectingKMeansModel</span>
 
-<span class="c"># Load and parse the data</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span>
-<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">array</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)]))</span>
+<span class="c1"># Load and parse the data</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span>
+<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">array</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)]))</span>
 
-<span class="c"># Build the model (cluster the data)</span>
+<span class="c1"># Build the model (cluster the data)</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">BisectingKMeans</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">parsedData</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">maxIterations</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
 
-<span class="c"># Evaluate clustering</span>
+<span class="c1"># Evaluate clustering</span>
 <span class="n">cost</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">computeCost</span><span class="p">(</span><span class="n">parsedData</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Bisecting K-means Cost = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">cost</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Bisecting K-means Cost = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">cost</span><span class="p">))</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">path</span> <span class="o">=</span> <span class="s">&quot;target/org/apache/spark/PythonBisectingKMeansExample/BisectingKMeansModel&quot;</span>
+<span class="c1"># Save and load model</span>
+<span class="n">path</span> <span class="o">=</span> <span class="s2">&quot;target/org/apache/spark/PythonBisectingKMeansExample/BisectingKMeansModel&quot;</span>
 <span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="n">path</span><span class="p">)</span>
 <span class="n">sameModel</span> <span class="o">=</span> <span class="n">BisectingKMeansModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="n">path</span><span class="p">)</span>
 </pre></div>
@@ -1223,7 +1223,7 @@ will be adjusted accordingly.</p>
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.clustering.StreamingKMeans"><code>StreamingKMeans</code> Scala docs</a> for details on the API.
 And Refer to <a href="streaming-programming-guide.html#initializing">Spark Streaming Programming Guide</a> for details on StreamingContext.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.StreamingKMeans</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.StreamingKMeans</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.streaming.</span><span class="o">{</span><span class="nc">Seconds</span><span class="o">,</span> <span class="nc">StreamingContext</span><span class="o">}</span>
@@ -1252,22 +1252,22 @@ And Refer to <a href="streaming-programming-guide.html#initializing">Spark Strea
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.StreamingKMeans"><code>StreamingKMeans</code> Python docs</a> for more details on the API.
 And Refer to <a href="streaming-programming-guide.html#initializing">Spark Streaming Programming Guide</a> for details on StreamingContext.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Vectors</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Vectors</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.regression</span> <span class="kn">import</span> <span class="n">LabeledPoint</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">StreamingKMeans</span>
 
-<span class="c"># we make an input stream of vectors for training,</span>
-<span class="c"># as well as a stream of vectors for testing</span>
+<span class="c1"># we make an input stream of vectors for training,</span>
+<span class="c1"># as well as a stream of vectors for testing</span>
 <span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="n">lp</span><span class="p">):</span>
-    <span class="n">label</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">lp</span><span class="p">[</span><span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">&#39;(&#39;</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">&#39;)&#39;</span><span class="p">)])</span>
-    <span class="n">vec</span> <span class="o">=</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">(</span><span class="n">lp</span><span class="p">[</span><span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">&#39;[&#39;</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">&#39;]&#39;</span><span class="p">)]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39;,&#39;</span><span class="p">))</span>
+    <span class="n">label</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">lp</span><span class="p">[</span><span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">&#39;(&#39;</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">&#39;)&#39;</span><span class="p">)])</span>
+    <span class="n">vec</span> <span class="o">=</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">(</span><span class="n">lp</span><span class="p">[</span><span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">&#39;[&#39;</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">&#39;]&#39;</span><span class="p">)]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;,&#39;</span><span class="p">))</span>
 
     <span class="k">return</span> <span class="n">LabeledPoint</span><span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">vec</span><span class="p">)</span>
 
-<span class="n">trainingData</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span>\
-    <span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)]))</span>
+<span class="n">trainingData</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span>\
+    <span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)]))</span>
 
-<span class="n">testingData</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/streaming_kmeans_data_test.txt&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">parse</span><span class="p">)</span>
+<span class="n">testingData</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/streaming_kmeans_data_test.txt&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">parse</span><span class="p">)</span>
 
 <span class="n">trainingQueue</span> <span class="o">=</span> <span class="p">[</span><span class="n">trainingData</span><span class="p">]</span>
 <span class="n">testingQueue</span> <span class="o">=</span> <span class="p">[</span><span class="n">testingData</span><span class="p">]</span>
@@ -1275,11 +1275,11 @@ And Refer to <a href="streaming-programming-guide.html#initializing">Spark Strea
 <span class="n">trainingStream</span> <span class="o">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">queueStream</span><span class="p">(</span><span class="n">trainingQueue</span><span class="p">)</span>
 <span class="n">testingStream</span> <span class="o">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">queueStream</span><span class="p">(</span><span class="n">testingQueue</span><span class="p">)</span>
 
-<span class="c"># We create a model with random clusters and specify the number of clusters to find</span>
+<span class="c1"># We create a model with random clusters and specify the number of clusters to find</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">StreamingKMeans</span><span class="p">(</span><span class="n">k</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">decayFactor</span><span class="o">=</span><span class="mf">1.0</span><span class="p">)</span><span class="o">.</span><span class="n">setRandomCenters</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
 
-<span class="c"># Now register the streams for training and testing and start the job,</span>
-<span class="c"># printing the predicted cluster assignments on new data points as they arrive.</span>
+<span class="c1"># Now register the streams for training and testing and start the job,</span>
+<span class="c1"># printing the predicted cluster assignments on new data points as they arrive.</span>
 <span class="n">model</span><span class="o">.</span><span class="n">trainOn</span><span class="p">(</span><span class="n">trainingStream</span><span class="p">)</span>
 
 <span class="n">result</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predictOnValues</span><span class="p">(</span><span class="n">testingStream</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="p">(</span><span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">,</span> <span class="n">lp</span><span class="o">.</span><span class="n">features</span><span class="p">)))</span>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[12/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-feature-extraction.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-feature-extraction.html b/site/docs/2.1.0/mllib-feature-extraction.html
index 4726b37..f8cd98e 100644
--- a/site/docs/2.1.0/mllib-feature-extraction.html
+++ b/site/docs/2.1.0/mllib-feature-extraction.html
@@ -307,32 +307,32 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#tf-idf" id="markdown-toc-tf-idf">TF-IDF</a></li>
-  <li><a href="#word2vec" id="markdown-toc-word2vec">Word2Vec</a>    <ul>
-      <li><a href="#model" id="markdown-toc-model">Model</a></li>
-      <li><a href="#example" id="markdown-toc-example">Example</a></li>
+  <li><a href="#tf-idf">TF-IDF</a></li>
+  <li><a href="#word2vec">Word2Vec</a>    <ul>
+      <li><a href="#model">Model</a></li>
+      <li><a href="#example">Example</a></li>
     </ul>
   </li>
-  <li><a href="#standardscaler" id="markdown-toc-standardscaler">StandardScaler</a>    <ul>
-      <li><a href="#model-fitting" id="markdown-toc-model-fitting">Model Fitting</a></li>
-      <li><a href="#example-1" id="markdown-toc-example-1">Example</a></li>
+  <li><a href="#standardscaler">StandardScaler</a>    <ul>
+      <li><a href="#model-fitting">Model Fitting</a></li>
+      <li><a href="#example-1">Example</a></li>
     </ul>
   </li>
-  <li><a href="#normalizer" id="markdown-toc-normalizer">Normalizer</a>    <ul>
-      <li><a href="#example-2" id="markdown-toc-example-2">Example</a></li>
+  <li><a href="#normalizer">Normalizer</a>    <ul>
+      <li><a href="#example-2">Example</a></li>
     </ul>
   </li>
-  <li><a href="#chisqselector" id="markdown-toc-chisqselector">ChiSqSelector</a>    <ul>
-      <li><a href="#model-fitting-1" id="markdown-toc-model-fitting-1">Model Fitting</a></li>
-      <li><a href="#example-3" id="markdown-toc-example-3">Example</a></li>
+  <li><a href="#chisqselector">ChiSqSelector</a>    <ul>
+      <li><a href="#model-fitting-1">Model Fitting</a></li>
+      <li><a href="#example-3">Example</a></li>
     </ul>
   </li>
-  <li><a href="#elementwiseproduct" id="markdown-toc-elementwiseproduct">ElementwiseProduct</a>    <ul>
-      <li><a href="#example-4" id="markdown-toc-example-4">Example</a></li>
+  <li><a href="#elementwiseproduct">ElementwiseProduct</a>    <ul>
+      <li><a href="#example-4">Example</a></li>
     </ul>
   </li>
-  <li><a href="#pca" id="markdown-toc-pca">PCA</a>    <ul>
-      <li><a href="#example-5" id="markdown-toc-example-5">Example</a></li>
+  <li><a href="#pca">PCA</a>    <ul>
+      <li><a href="#example-5">Example</a></li>
     </ul>
   </li>
 </ul>
@@ -390,7 +390,7 @@ Each record could be an iterable of strings or other types.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.feature.HashingTF"><code>HashingTF</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.</span><span class="o">{</span><span class="nc">HashingTF</span><span class="o">,</span> <span class="nc">IDF</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.</span><span class="o">{</span><span class="nc">HashingTF</span><span class="o">,</span> <span class="nc">IDF</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vector</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.rdd.RDD</span>
 
@@ -424,24 +424,24 @@ Each record could be an iterable of strings or other types.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.feature.HashingTF"><code>HashingTF</code> Python docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.feature</span> <span class="kn">import</span> <span class="n">HashingTF</span><span class="p">,</span> <span class="n">IDF</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.feature</span> <span class="kn">import</span> <span class="n">HashingTF</span><span class="p">,</span> <span class="n">IDF</span>
 
-<span class="c"># Load documents (one per line).</span>
-<span class="n">documents</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">))</span>
+<span class="c1"># Load documents (one per line).</span>
+<span class="n">documents</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">))</span>
 
 <span class="n">hashingTF</span> <span class="o">=</span> <span class="n">HashingTF</span><span class="p">()</span>
 <span class="n">tf</span> <span class="o">=</span> <span class="n">hashingTF</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">documents</span><span class="p">)</span>
 
-<span class="c"># While applying HashingTF only needs a single pass to the data, applying IDF needs two passes:</span>
-<span class="c"># First to compute the IDF vector and second to scale the term frequencies by IDF.</span>
+<span class="c1"># While applying HashingTF only needs a single pass to the data, applying IDF needs two passes:</span>
+<span class="c1"># First to compute the IDF vector and second to scale the term frequencies by IDF.</span>
 <span class="n">tf</span><span class="o">.</span><span class="n">cache</span><span class="p">()</span>
 <span class="n">idf</span> <span class="o">=</span> <span class="n">IDF</span><span class="p">()</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">tf</span><span class="p">)</span>
 <span class="n">tfidf</span> <span class="o">=</span> <span class="n">idf</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">tf</span><span class="p">)</span>
 
-<span class="c"># spark.mllib&#39;s IDF implementation provides an option for ignoring terms</span>
-<span class="c"># which occur in less than a minimum number of documents.</span>
-<span class="c"># In such cases, the IDF for these terms is set to 0.</span>
-<span class="c"># This feature can be used by passing the minDocFreq value to the IDF constructor.</span>
+<span class="c1"># spark.mllib&#39;s IDF implementation provides an option for ignoring terms</span>
+<span class="c1"># which occur in less than a minimum number of documents.</span>
+<span class="c1"># In such cases, the IDF for these terms is set to 0.</span>
+<span class="c1"># This feature can be used by passing the minDocFreq value to the IDF constructor.</span>
 <span class="n">idfIgnore</span> <span class="o">=</span> <span class="n">IDF</span><span class="p">(</span><span class="n">minDocFreq</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">tf</span><span class="p">)</span>
 <span class="n">tfidfIgnore</span> <span class="o">=</span> <span class="n">idfIgnore</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">tf</span><span class="p">)</span>
 </pre></div>
@@ -467,7 +467,7 @@ skip-gram model is to maximize the average log-likelihood
 <code>\[
 \frac{1}{T} \sum_{t = 1}^{T}\sum_{j=-k}^{j=k} \log p(w_{t+j} | w_t)
 \]</code>
-where $k$ is the size of the training window.</p>
+where $k$ is the size of the training window.  </p>
 
 <p>In the skip-gram model, every word $w$ is associated with two vectors $u_w$ and $v_w$ which are 
 vector representations of $w$ as word and context respectively. The probability of correctly 
@@ -475,7 +475,7 @@ predicting word $w_i$ given word $w_j$ is determined by the softmax model, which
 <code>\[
 p(w_i | w_j ) = \frac{\exp(u_{w_i}^{\top}v_{w_j})}{\sum_{l=1}^{V} \exp(u_l^{\top}v_{w_j})}
 \]</code>
-where $V$ is the vocabulary size.</p>
+where $V$ is the vocabulary size. </p>
 
 <p>The skip-gram model with softmax is expensive because the cost of computing $\log p(w_i | w_j)$ 
 is proportional to $V$, which can be easily in order of millions. To speed up training of Word2Vec, 
@@ -488,13 +488,13 @@ $O(\log(V))$</p>
 construct a <code>Word2Vec</code> instance and then fit a <code>Word2VecModel</code> with the input data. Finally,
 we display the top 40 synonyms of the specified word. To run the example, first download
 the <a href="http://mattmahoney.net/dc/text8.zip">text8</a> data and extract it to your preferred directory.
-Here we assume the extracted file is <code>text8</code> and in same directory as you run the spark shell.</p>
+Here we assume the extracted file is <code>text8</code> and in same directory as you run the spark shell.  </p>
 
 <div class="codetabs">
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.feature.Word2Vec"><code>Word2Vec</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.</span><span class="o">{</span><span class="nc">Word2Vec</span><span class="o">,</span> <span class="nc">Word2VecModel</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.</span><span class="o">{</span><span class="nc">Word2Vec</span><span class="o">,</span> <span class="nc">Word2VecModel</span><span class="o">}</span>
 
 <span class="k">val</span> <span class="n">input</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;data/mllib/sample_lda_data.txt&quot;</span><span class="o">).</span><span class="n">map</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">).</span><span class="n">toSeq</span><span class="o">)</span>
 
@@ -505,7 +505,7 @@ Here we assume the extracted file is <code>text8</code> and in same directory as
 <span class="k">val</span> <span class="n">synonyms</span> <span class="k">=</span> <span class="n">model</span><span class="o">.</span><span class="n">findSynonyms</span><span class="o">(</span><span class="s">&quot;1&quot;</span><span class="o">,</span> <span class="mi">5</span><span class="o">)</span>
 
 <span class="k">for</span><span class="o">((</span><span class="n">synonym</span><span class="o">,</span> <span class="n">cosineSimilarity</span><span class="o">)</span> <span class="k">&lt;-</span> <span class="n">synonyms</span><span class="o">)</span> <span class="o">{</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;$synonym $cosineSimilarity&quot;</span><span class="o">)</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;</span><span class="si">$synonym</span><span class="s"> </span><span class="si">$cosineSimilarity</span><span class="s">&quot;</span><span class="o">)</span>
 <span class="o">}</span>
 
 <span class="c1">// Save and load model</span>
@@ -517,17 +517,17 @@ Here we assume the extracted file is <code>text8</code> and in same directory as
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.feature.Word2Vec"><code>Word2Vec</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.feature</span> <span class="kn">import</span> <span class="n">Word2Vec</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.feature</span> <span class="kn">import</span> <span class="n">Word2Vec</span>
 
-<span class="n">inp</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_lda_data.txt&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">row</span><span class="p">:</span> <span class="n">row</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">))</span>
+<span class="n">inp</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_lda_data.txt&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">row</span><span class="p">:</span> <span class="n">row</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">))</span>
 
 <span class="n">word2vec</span> <span class="o">=</span> <span class="n">Word2Vec</span><span class="p">()</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">word2vec</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">inp</span><span class="p">)</span>
 
-<span class="n">synonyms</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">findSynonyms</span><span class="p">(</span><span class="s">&#39;1&#39;</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
+<span class="n">synonyms</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">findSynonyms</span><span class="p">(</span><span class="s1">&#39;1&#39;</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
 
 <span class="k">for</span> <span class="n">word</span><span class="p">,</span> <span class="n">cosine_distance</span> <span class="ow">in</span> <span class="n">synonyms</span><span class="p">:</span>
-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;{}: {}&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="n">cosine_distance</span><span class="p">))</span>
+    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;{}: {}&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="n">cosine_distance</span><span class="p">))</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/word2vec_example.py" in the Spark repo.</small></div>
   </div>
@@ -576,7 +576,7 @@ so that the new features have unit standard deviation and/or zero mean.</p>
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.feature.StandardScaler"><code>StandardScaler</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.</span><span class="o">{</span><span class="nc">StandardScaler</span><span class="o">,</span> <span class="nc">StandardScalerModel</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.</span><span class="o">{</span><span class="nc">StandardScaler</span><span class="o">,</span> <span class="nc">StandardScalerModel</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
 
@@ -599,21 +599,21 @@ so that the new features have unit standard deviation and/or zero mean.</p>
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.feature.StandardScaler"><code>StandardScaler</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.feature</span> <span class="kn">import</span> <span class="n">StandardScaler</span><span class="p">,</span> <span class="n">StandardScalerModel</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.feature</span> <span class="kn">import</span> <span class="n">StandardScaler</span><span class="p">,</span> <span class="n">StandardScalerModel</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Vectors</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span>
 
-<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
 <span class="n">label</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">label</span><span class="p">)</span>
 <span class="n">features</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">features</span><span class="p">)</span>
 
 <span class="n">scaler1</span> <span class="o">=</span> <span class="n">StandardScaler</span><span class="p">()</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">features</span><span class="p">)</span>
 <span class="n">scaler2</span> <span class="o">=</span> <span class="n">StandardScaler</span><span class="p">(</span><span class="n">withMean</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">withStd</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">features</span><span class="p">)</span>
 
-<span class="c"># data1 will be unit variance.</span>
+<span class="c1"># data1 will be unit variance.</span>
 <span class="n">data1</span> <span class="o">=</span> <span class="n">label</span><span class="o">.</span><span class="n">zip</span><span class="p">(</span><span class="n">scaler1</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">features</span><span class="p">))</span>
 
-<span class="c"># data2 will be unit variance and zero mean.</span>
+<span class="c1"># data2 will be unit variance and zero mean.</span>
 <span class="n">data2</span> <span class="o">=</span> <span class="n">label</span><span class="o">.</span><span class="n">zip</span><span class="p">(</span><span class="n">scaler2</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">features</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">toArray</span><span class="p">()))))</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/standard_scaler_example.py" in the Spark repo.</small></div>
@@ -648,7 +648,7 @@ with $L^2$ norm, and $L^\infty$ norm.</p>
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.feature.Normalizer"><code>Normalizer</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.Normalizer</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.Normalizer</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
 
 <span class="k">val</span> <span class="n">data</span> <span class="k">=</span> <span class="nc">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="o">)</span>
@@ -668,20 +668,20 @@ with $L^2$ norm, and $L^\infty$ norm.</p>
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.feature.Normalizer"><code>Normalizer</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.feature</span> <span class="kn">import</span> <span class="n">Normalizer</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.feature</span> <span class="kn">import</span> <span class="n">Normalizer</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span>
 
-<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;data/mllib/sample_libsvm_data.txt&quot;</span><span class="p">)</span>
 <span class="n">labels</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">label</span><span class="p">)</span>
 <span class="n">features</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="o">.</span><span class="n">features</span><span class="p">)</span>
 
 <span class="n">normalizer1</span> <span class="o">=</span> <span class="n">Normalizer</span><span class="p">()</span>
-<span class="n">normalizer2</span> <span class="o">=</span> <span class="n">Normalizer</span><span class="p">(</span><span class="n">p</span><span class="o">=</span><span class="nb">float</span><span class="p">(</span><span class="s">&quot;inf&quot;</span><span class="p">))</span>
+<span class="n">normalizer2</span> <span class="o">=</span> <span class="n">Normalizer</span><span class="p">(</span><span class="n">p</span><span class="o">=</span><span class="nb">float</span><span class="p">(</span><span class="s2">&quot;inf&quot;</span><span class="p">))</span>
 
-<span class="c"># Each sample in data1 will be normalized using $L^2$ norm.</span>
+<span class="c1"># Each sample in data1 will be normalized using $L^2$ norm.</span>
 <span class="n">data1</span> <span class="o">=</span> <span class="n">labels</span><span class="o">.</span><span class="n">zip</span><span class="p">(</span><span class="n">normalizer1</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">features</span><span class="p">))</span>
 
-<span class="c"># Each sample in data2 will be normalized using $L^\infty$ norm.</span>
+<span class="c1"># Each sample in data2 will be normalized using $L^\infty$ norm.</span>
 <span class="n">data2</span> <span class="o">=</span> <span class="n">labels</span><span class="o">.</span><span class="n">zip</span><span class="p">(</span><span class="n">normalizer2</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">features</span><span class="p">))</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/normalizer_example.py" in the Spark repo.</small></div>
@@ -730,7 +730,7 @@ an <code>RDD[Vector]</code> to produce a reduced <code>RDD[Vector]</code>.</p>
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.feature.ChiSqSelector"><code>ChiSqSelector</code> Scala docs</a>
 for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.ChiSqSelector</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.ChiSqSelector</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
@@ -759,7 +759,7 @@ for details on the API.</p>
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/feature/ChiSqSelector.html"><code>ChiSqSelector</code> Java docs</a>
 for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.feature.ChiSqSelector</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.feature.ChiSqSelectorModel</span><span class="o">;</span>
@@ -780,13 +780,13 @@ for details on the API.</p>
       <span class="k">for</span> <span class="o">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="o">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">lp</span><span class="o">.</span><span class="na">features</span><span class="o">().</span><span class="na">size</span><span class="o">();</span> <span class="o">++</span><span class="n">i</span><span class="o">)</span> <span class="o">{</span>
         <span class="n">discretizedFeatures</span><span class="o">[</span><span class="n">i</span><span class="o">]</span> <span class="o">=</span> <span class="n">Math</span><span class="o">.</span><span class="na">floor</span><span class="o">(</span><span class="n">lp</span><span class="o">.</span><span class="na">features</span><span class="o">().</span><span class="na">apply</span><span class="o">(</span><span class="n">i</span><span class="o">)</span> <span class="o">/</span> <span class="mi">16</span><span class="o">);</span>
       <span class="o">}</span>
-      <span class="k">return</span> <span class="k">new</span> <span class="nf">LabeledPoint</span><span class="o">(</span><span class="n">lp</span><span class="o">.</span><span class="na">label</span><span class="o">(),</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="n">discretizedFeatures</span><span class="o">));</span>
+      <span class="k">return</span> <span class="k">new</span> <span class="n">LabeledPoint</span><span class="o">(</span><span class="n">lp</span><span class="o">.</span><span class="na">label</span><span class="o">(),</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="n">discretizedFeatures</span><span class="o">));</span>
     <span class="o">}</span>
   <span class="o">}</span>
 <span class="o">);</span>
 
 <span class="c1">// Create ChiSqSelector that will select top 50 of 692 features</span>
-<span class="n">ChiSqSelector</span> <span class="n">selector</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">ChiSqSelector</span><span class="o">(</span><span class="mi">50</span><span class="o">);</span>
+<span class="n">ChiSqSelector</span> <span class="n">selector</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ChiSqSelector</span><span class="o">(</span><span class="mi">50</span><span class="o">);</span>
 <span class="c1">// Create ChiSqSelector model (selecting features)</span>
 <span class="kd">final</span> <span class="n">ChiSqSelectorModel</span> <span class="n">transformer</span> <span class="o">=</span> <span class="n">selector</span><span class="o">.</span><span class="na">fit</span><span class="o">(</span><span class="n">discretizedData</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 <span class="c1">// Filter the top 50 features from each feature vector</span>
@@ -794,7 +794,7 @@ for details on the API.</p>
   <span class="k">new</span> <span class="n">Function</span><span class="o">&lt;</span><span class="n">LabeledPoint</span><span class="o">,</span> <span class="n">LabeledPoint</span><span class="o">&gt;()</span> <span class="o">{</span>
     <span class="nd">@Override</span>
     <span class="kd">public</span> <span class="n">LabeledPoint</span> <span class="nf">call</span><span class="o">(</span><span class="n">LabeledPoint</span> <span class="n">lp</span><span class="o">)</span> <span class="o">{</span>
-      <span class="k">return</span> <span class="k">new</span> <span class="nf">LabeledPoint</span><span class="o">(</span><span class="n">lp</span><span class="o">.</span><span class="na">label</span><span class="o">(),</span> <span class="n">transformer</span><span class="o">.</span><span class="na">transform</span><span class="o">(</span><span class="n">lp</span><span class="o">.</span><span class="na">features</span><span class="o">()));</span>
+      <span class="k">return</span> <span class="k">new</span> <span class="n">LabeledPoint</span><span class="o">(</span><span class="n">lp</span><span class="o">.</span><span class="na">label</span><span class="o">(),</span> <span class="n">transformer</span><span class="o">.</span><span class="na">transform</span><span class="o">(</span><span class="n">lp</span><span class="o">.</span><span class="na">features</span><span class="o">()));</span>
     <span class="o">}</span>
   <span class="o">}</span>
 <span class="o">);</span>
@@ -845,7 +845,7 @@ v_N
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.feature.ElementwiseProduct"><code>ElementwiseProduct</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.ElementwiseProduct</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.ElementwiseProduct</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 
 <span class="c1">// Create some vector data; also works for sparse vectors</span>
@@ -864,7 +864,7 @@ v_N
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/feature/ElementwiseProduct.html"><code>ElementwiseProduct</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
@@ -876,7 +876,7 @@ v_N
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Vector</span><span class="o">&gt;</span> <span class="n">data</span> <span class="o">=</span> <span class="n">jsc</span><span class="o">.</span><span class="na">parallelize</span><span class="o">(</span><span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span>
   <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="mf">2.0</span><span class="o">,</span> <span class="mf">3.0</span><span class="o">),</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="mf">4.0</span><span class="o">,</span> <span class="mf">5.0</span><span class="o">,</span> <span class="mf">6.0</span><span class="o">)));</span>
 <span class="n">Vector</span> <span class="n">transformingVector</span> <span class="o">=</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="mf">0.0</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">,</span> <span class="mf">2.0</span><span class="o">);</span>
-<span class="kd">final</span> <span class="n">ElementwiseProduct</span> <span class="n">transformer</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">ElementwiseProduct</span><span class="o">(</span><span class="n">transformingVector</span><span class="o">);</span>
+<span class="kd">final</span> <span class="n">ElementwiseProduct</span> <span class="n">transformer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ElementwiseProduct</span><span class="o">(</span><span class="n">transformingVector</span><span class="o">);</span>
 
 <span class="c1">// Batch transform and per-row transform give the same results:</span>
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Vector</span><span class="o">&gt;</span> <span class="n">transformedData</span> <span class="o">=</span> <span class="n">transformer</span><span class="o">.</span><span class="na">transform</span><span class="o">(</span><span class="n">data</span><span class="o">);</span>
@@ -895,19 +895,19 @@ v_N
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.feature.ElementwiseProduct"><code>ElementwiseProduct</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.feature</span> <span class="kn">import</span> <span class="n">ElementwiseProduct</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.feature</span> <span class="kn">import</span> <span class="n">ElementwiseProduct</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Vectors</span>
 
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span>
-<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="p">[</span><span class="nb">float</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">x</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">)])</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span>
+<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="p">[</span><span class="nb">float</span><span class="p">(</span><span class="n">t</span><span class="p">)</span> <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">x</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">)])</span>
 
-<span class="c"># Create weight vector.</span>
+<span class="c1"># Create weight vector.</span>
 <span class="n">transformingVector</span> <span class="o">=</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">])</span>
 <span class="n">transformer</span> <span class="o">=</span> <span class="n">ElementwiseProduct</span><span class="p">(</span><span class="n">transformingVector</span><span class="p">)</span>
 
-<span class="c"># Batch transform</span>
+<span class="c1"># Batch transform</span>
 <span class="n">transformedData</span> <span class="o">=</span> <span class="n">transformer</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">parsedData</span><span class="p">)</span>
-<span class="c"># Single-row transform</span>
+<span class="c1"># Single-row transform</span>
 <span class="n">transformedData2</span> <span class="o">=</span> <span class="n">transformer</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">parsedData</span><span class="o">.</span><span class="n">first</span><span class="p">())</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/elementwise_product_example.py" in the Spark repo.</small></div>
@@ -929,7 +929,7 @@ for calculation a <a href="mllib-linear-methods.html">Linear Regression</a></p>
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.feature.PCA"><code>PCA</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.PCA</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.feature.PCA</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.</span><span class="o">{</span><span class="nc">LabeledPoint</span><span class="o">,</span> <span class="nc">LinearRegressionWithSGD</span><span class="o">}</span>
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-frequent-pattern-mining.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-frequent-pattern-mining.html b/site/docs/2.1.0/mllib-frequent-pattern-mining.html
index 47ed977..a9b76b5 100644
--- a/site/docs/2.1.0/mllib-frequent-pattern-mining.html
+++ b/site/docs/2.1.0/mllib-frequent-pattern-mining.html
@@ -389,7 +389,7 @@ details) from <code>transactions</code>.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.fpm.FPGrowth"><code>FPGrowth</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.fpm.FPGrowth</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.fpm.FPGrowth</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.rdd.RDD</span>
 
 <span class="k">val</span> <span class="n">data</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;data/mllib/sample_fpgrowth.txt&quot;</span><span class="o">)</span>
@@ -432,7 +432,7 @@ details) from <code>transactions</code>.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/fpm/FPGrowth.html"><code>FPGrowth</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.List</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
@@ -453,12 +453,12 @@ details) from <code>transactions</code>.</p>
   <span class="o">}</span>
 <span class="o">);</span>
 
-<span class="n">FPGrowth</span> <span class="n">fpg</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">FPGrowth</span><span class="o">()</span>
+<span class="n">FPGrowth</span> <span class="n">fpg</span> <span class="o">=</span> <span class="k">new</span> <span class="n">FPGrowth</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setMinSupport</span><span class="o">(</span><span class="mf">0.2</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setNumPartitions</span><span class="o">(</span><span class="mi">10</span><span class="o">);</span>
 <span class="n">FPGrowthModel</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">model</span> <span class="o">=</span> <span class="n">fpg</span><span class="o">.</span><span class="na">run</span><span class="o">(</span><span class="n">transactions</span><span class="o">);</span>
 
-<span class="k">for</span> <span class="o">(</span><span class="n">FPGrowth</span><span class="o">.</span><span class="na">FreqItemset</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="nl">itemset:</span> <span class="n">model</span><span class="o">.</span><span class="na">freqItemsets</span><span class="o">().</span><span class="na">toJavaRDD</span><span class="o">().</span><span class="na">collect</span><span class="o">())</span> <span class="o">{</span>
+<span class="k">for</span> <span class="o">(</span><span class="n">FPGrowth</span><span class="o">.</span><span class="na">FreqItemset</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">itemset</span><span class="o">:</span> <span class="n">model</span><span class="o">.</span><span class="na">freqItemsets</span><span class="o">().</span><span class="na">toJavaRDD</span><span class="o">().</span><span class="na">collect</span><span class="o">())</span> <span class="o">{</span>
   <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;[&quot;</span> <span class="o">+</span> <span class="n">itemset</span><span class="o">.</span><span class="na">javaItems</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot;], &quot;</span> <span class="o">+</span> <span class="n">itemset</span><span class="o">.</span><span class="na">freq</span><span class="o">());</span>
 <span class="o">}</span>
 
@@ -484,10 +484,10 @@ that stores the frequent itemsets with their frequencies.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.fpm.FPGrowth"><code>FPGrowth</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.fpm</span> <span class="kn">import</span> <span class="n">FPGrowth</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.fpm</span> <span class="kn">import</span> <span class="n">FPGrowth</span>
 
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_fpgrowth.txt&quot;</span><span class="p">)</span>
-<span class="n">transactions</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">))</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_fpgrowth.txt&quot;</span><span class="p">)</span>
+<span class="n">transactions</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">))</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">FPGrowth</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">transactions</span><span class="p">,</span> <span class="n">minSupport</span><span class="o">=</span><span class="mf">0.2</span><span class="p">,</span> <span class="n">numPartitions</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>
 <span class="n">result</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">freqItemsets</span><span class="p">()</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span>
 <span class="k">for</span> <span class="n">fi</span> <span class="ow">in</span> <span class="n">result</span><span class="p">:</span>
@@ -509,7 +509,7 @@ that have a single item as the consequent.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/fpm/AssociationRules.html"><code>AssociationRules</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.fpm.AssociationRules</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.fpm.AssociationRules</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.fpm.FPGrowth.FreqItemset</span>
 
 <span class="k">val</span> <span class="n">freqItemsets</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span><span class="nc">Seq</span><span class="o">(</span>
@@ -539,7 +539,7 @@ that have a single item as the consequent.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/fpm/AssociationRules.html"><code>AssociationRules</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaSparkContext</span><span class="o">;</span>
@@ -553,7 +553,7 @@ that have a single item as the consequent.</p>
   <span class="k">new</span> <span class="n">FreqItemset</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;(</span><span class="k">new</span> <span class="n">String</span><span class="o">[]</span> <span class="o">{</span><span class="s">&quot;a&quot;</span><span class="o">,</span> <span class="s">&quot;b&quot;</span><span class="o">},</span> <span class="mi">12L</span><span class="o">)</span>
 <span class="o">));</span>
 
-<span class="n">AssociationRules</span> <span class="n">arules</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">AssociationRules</span><span class="o">()</span>
+<span class="n">AssociationRules</span> <span class="n">arules</span> <span class="o">=</span> <span class="k">new</span> <span class="n">AssociationRules</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setMinConfidence</span><span class="o">(</span><span class="mf">0.8</span><span class="o">);</span>
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">AssociationRules</span><span class="o">.</span><span class="na">Rule</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;&gt;</span> <span class="n">results</span> <span class="o">=</span> <span class="n">arules</span><span class="o">.</span><span class="na">run</span><span class="o">(</span><span class="n">freqItemsets</span><span class="o">);</span>
 
@@ -611,7 +611,7 @@ that stores the frequent sequences with their frequencies.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.fpm.PrefixSpan"><code>PrefixSpan</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.fpm.PrefixSpanModel"><code>PrefixSpanModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.fpm.PrefixSpan</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.fpm.PrefixSpan</span>
 
 <span class="k">val</span> <span class="n">sequences</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="o">(</span><span class="nc">Seq</span><span class="o">(</span>
   <span class="nc">Array</span><span class="o">(</span><span class="nc">Array</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">2</span><span class="o">),</span> <span class="nc">Array</span><span class="o">(</span><span class="mi">3</span><span class="o">)),</span>
@@ -643,7 +643,7 @@ that stores the frequent sequences with their frequencies.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/fpm/PrefixSpan.html"><code>PrefixSpan</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/fpm/PrefixSpanModel.html"><code>PrefixSpanModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.List</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.fpm.PrefixSpan</span><span class="o">;</span>
@@ -655,11 +655,11 @@ that stores the frequent sequences with their frequencies.</p>
   <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mi">2</span><span class="o">),</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="mi">5</span><span class="o">)),</span>
   <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="mi">6</span><span class="o">))</span>
 <span class="o">),</span> <span class="mi">2</span><span class="o">);</span>
-<span class="n">PrefixSpan</span> <span class="n">prefixSpan</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">PrefixSpan</span><span class="o">()</span>
+<span class="n">PrefixSpan</span> <span class="n">prefixSpan</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PrefixSpan</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setMinSupport</span><span class="o">(</span><span class="mf">0.5</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setMaxPatternLength</span><span class="o">(</span><span class="mi">5</span><span class="o">);</span>
 <span class="n">PrefixSpanModel</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">model</span> <span class="o">=</span> <span class="n">prefixSpan</span><span class="o">.</span><span class="na">run</span><span class="o">(</span><span class="n">sequences</span><span class="o">);</span>
-<span class="k">for</span> <span class="o">(</span><span class="n">PrefixSpan</span><span class="o">.</span><span class="na">FreqSequence</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="nl">freqSeq:</span> <span class="n">model</span><span class="o">.</span><span class="na">freqSequences</span><span class="o">().</span><span class="na">toJavaRDD</span><span class="o">().</span><span class="na">collect</span><span class="o">())</span> <span class="o">{</span>
+<span class="k">for</span> <span class="o">(</span><span class="n">PrefixSpan</span><span class="o">.</span><span class="na">FreqSequence</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">freqSeq</span><span class="o">:</span> <span class="n">model</span><span class="o">.</span><span class="na">freqSequences</span><span class="o">().</span><span class="na">toJavaRDD</span><span class="o">().</span><span class="na">collect</span><span class="o">())</span> <span class="o">{</span>
   <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">freqSeq</span><span class="o">.</span><span class="na">javaSequence</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot;, &quot;</span> <span class="o">+</span> <span class="n">freqSeq</span><span class="o">.</span><span class="na">freq</span><span class="o">());</span>
 <span class="o">}</span>
 </pre></div>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-isotonic-regression.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-isotonic-regression.html b/site/docs/2.1.0/mllib-isotonic-regression.html
index aa7edb3..78bbaba 100644
--- a/site/docs/2.1.0/mllib-isotonic-regression.html
+++ b/site/docs/2.1.0/mllib-isotonic-regression.html
@@ -365,7 +365,7 @@ labels and real labels in the test set.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.regression.IsotonicRegression"><code>IsotonicRegression</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.regression.IsotonicRegressionModel"><code>IsotonicRegressionModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.</span><span class="o">{</span><span class="nc">IsotonicRegression</span><span class="o">,</span> <span class="nc">IsotonicRegressionModel</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.</span><span class="o">{</span><span class="nc">IsotonicRegression</span><span class="o">,</span> <span class="nc">IsotonicRegressionModel</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
 
 <span class="k">val</span> <span class="n">data</span> <span class="k">=</span> <span class="nc">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="o">(</span><span class="n">sc</span><span class="o">,</span>
@@ -409,7 +409,7 @@ labels and real labels in the test set.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/regression/IsotonicRegression.html"><code>IsotonicRegression</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/regression/IsotonicRegressionModel.html"><code>IsotonicRegressionModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">scala.Tuple3</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.PairFunction</span><span class="o">;</span>
@@ -429,8 +429,8 @@ labels and real labels in the test set.</p>
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Tuple3</span><span class="o">&lt;</span><span class="n">Double</span><span class="o">,</span> <span class="n">Double</span><span class="o">,</span> <span class="n">Double</span><span class="o">&gt;&gt;</span> <span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="na">map</span><span class="o">(</span>
   <span class="k">new</span> <span class="n">Function</span><span class="o">&lt;</span><span class="n">LabeledPoint</span><span class="o">,</span> <span class="n">Tuple3</span><span class="o">&lt;</span><span class="n">Double</span><span class="o">,</span> <span class="n">Double</span><span class="o">,</span> <span class="n">Double</span><span class="o">&gt;&gt;()</span> <span class="o">{</span>
     <span class="kd">public</span> <span class="n">Tuple3</span><span class="o">&lt;</span><span class="n">Double</span><span class="o">,</span> <span class="n">Double</span><span class="o">,</span> <span class="n">Double</span><span class="o">&gt;</span> <span class="nf">call</span><span class="o">(</span><span class="n">LabeledPoint</span> <span class="n">point</span><span class="o">)</span> <span class="o">{</span>
-      <span class="k">return</span> <span class="k">new</span> <span class="n">Tuple3</span><span class="o">&lt;&gt;(</span><span class="k">new</span> <span class="nf">Double</span><span class="o">(</span><span class="n">point</span><span class="o">.</span><span class="na">label</span><span class="o">()),</span>
-        <span class="k">new</span> <span class="nf">Double</span><span class="o">(</span><span class="n">point</span><span class="o">.</span><span class="na">features</span><span class="o">().</span><span class="na">apply</span><span class="o">(</span><span class="mi">0</span><span class="o">)),</span> <span class="mf">1.0</span><span class="o">);</span>
+      <span class="k">return</span> <span class="k">new</span> <span class="n">Tuple3</span><span class="o">&lt;&gt;(</span><span class="k">new</span> <span class="n">Double</span><span class="o">(</span><span class="n">point</span><span class="o">.</span><span class="na">label</span><span class="o">()),</span>
+        <span class="k">new</span> <span class="n">Double</span><span class="o">(</span><span class="n">point</span><span class="o">.</span><span class="na">features</span><span class="o">().</span><span class="na">apply</span><span class="o">(</span><span class="mi">0</span><span class="o">)),</span> <span class="mf">1.0</span><span class="o">);</span>
     <span class="o">}</span>
   <span class="o">}</span>
 <span class="o">);</span>
@@ -444,7 +444,7 @@ labels and real labels in the test set.</p>
 <span class="c1">// Create isotonic regression model from training data.</span>
 <span class="c1">// Isotonic parameter defaults to true so it is only shown for demonstration</span>
 <span class="kd">final</span> <span class="n">IsotonicRegressionModel</span> <span class="n">model</span> <span class="o">=</span>
-  <span class="k">new</span> <span class="nf">IsotonicRegression</span><span class="o">().</span><span class="na">setIsotonic</span><span class="o">(</span><span class="kc">true</span><span class="o">).</span><span class="na">run</span><span class="o">(</span><span class="n">training</span><span class="o">);</span>
+  <span class="k">new</span> <span class="n">IsotonicRegression</span><span class="o">().</span><span class="na">setIsotonic</span><span class="o">(</span><span class="kc">true</span><span class="o">).</span><span class="na">run</span><span class="o">(</span><span class="n">training</span><span class="o">);</span>
 
 <span class="c1">// Create tuples of predicted and real labels.</span>
 <span class="n">JavaPairRDD</span><span class="o">&lt;</span><span class="n">Double</span><span class="o">,</span> <span class="n">Double</span><span class="o">&gt;</span> <span class="n">predictionAndLabel</span> <span class="o">=</span> <span class="n">test</span><span class="o">.</span><span class="na">mapToPair</span><span class="o">(</span>
@@ -458,7 +458,7 @@ labels and real labels in the test set.</p>
 <span class="o">);</span>
 
 <span class="c1">// Calculate mean squared error between predicted and real labels.</span>
-<span class="n">Double</span> <span class="n">meanSquaredError</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JavaDoubleRDD</span><span class="o">(</span><span class="n">predictionAndLabel</span><span class="o">.</span><span class="na">map</span><span class="o">(</span>
+<span class="n">Double</span> <span class="n">meanSquaredError</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaDoubleRDD</span><span class="o">(</span><span class="n">predictionAndLabel</span><span class="o">.</span><span class="na">map</span><span class="o">(</span>
   <span class="k">new</span> <span class="n">Function</span><span class="o">&lt;</span><span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">Double</span><span class="o">,</span> <span class="n">Double</span><span class="o">&gt;,</span> <span class="n">Object</span><span class="o">&gt;()</span> <span class="o">{</span>
     <span class="nd">@Override</span>
     <span class="kd">public</span> <span class="n">Object</span> <span class="nf">call</span><span class="o">(</span><span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">Double</span><span class="o">,</span> <span class="n">Double</span><span class="o">&gt;</span> <span class="n">pl</span><span class="o">)</span> <span class="o">{</span>
@@ -483,36 +483,36 @@ labels and real labels in the test set.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.regression.IsotonicRegression"><code>IsotonicRegression</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.regression.IsotonicRegressionModel"><code>IsotonicRegressionModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">math</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">math</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.regression</span> <span class="kn">import</span> <span class="n">LabeledPoint</span><span class="p">,</span> <span class="n">IsotonicRegression</span><span class="p">,</span> <span class="n">IsotonicRegressionModel</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span>
 
-<span class="c"># Load and parse the data</span>
+<span class="c1"># Load and parse the data</span>
 <span class="k">def</span> <span class="nf">parsePoint</span><span class="p">(</span><span class="n">labeledData</span><span class="p">):</span>
     <span class="k">return</span> <span class="p">(</span><span class="n">labeledData</span><span class="o">.</span><span class="n">label</span><span class="p">,</span> <span class="n">labeledData</span><span class="o">.</span><span class="n">features</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="mf">1.0</span><span class="p">)</span>
 
-<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;data/mllib/sample_isotonic_regression_libsvm_data.txt&quot;</span><span class="p">)</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">loadLibSVMFile</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;data/mllib/sample_isotonic_regression_libsvm_data.txt&quot;</span><span class="p">)</span>
 
-<span class="c"># Create label, feature, weight tuples from input data with weight set to default value 1.0.</span>
+<span class="c1"># Create label, feature, weight tuples from input data with weight set to default value 1.0.</span>
 <span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">parsePoint</span><span class="p">)</span>
 
-<span class="c"># Split data into training (60%) and test (40%) sets.</span>
+<span class="c1"># Split data into training (60%) and test (40%) sets.</span>
 <span class="n">training</span><span class="p">,</span> <span class="n">test</span> <span class="o">=</span> <span class="n">parsedData</span><span class="o">.</span><span class="n">randomSplit</span><span class="p">([</span><span class="mf">0.6</span><span class="p">,</span> <span class="mf">0.4</span><span class="p">],</span> <span class="mi">11</span><span class="p">)</span>
 
-<span class="c"># Create isotonic regression model from training data.</span>
-<span class="c"># Isotonic parameter defaults to true so it is only shown for demonstration</span>
+<span class="c1"># Create isotonic regression model from training data.</span>
+<span class="c1"># Isotonic parameter defaults to true so it is only shown for demonstration</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">IsotonicRegression</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
 
-<span class="c"># Create tuples of predicted and real labels.</span>
+<span class="c1"># Create tuples of predicted and real labels.</span>
 <span class="n">predictionAndLabel</span> <span class="o">=</span> <span class="n">test</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="p">(</span><span class="n">model</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">p</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">p</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span>
 
-<span class="c"># Calculate mean squared error between predicted and real labels.</span>
+<span class="c1"># Calculate mean squared error between predicted and real labels.</span>
 <span class="n">meanSquaredError</span> <span class="o">=</span> <span class="n">predictionAndLabel</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">pl</span><span class="p">:</span> <span class="n">math</span><span class="o">.</span><span class="n">pow</span><span class="p">((</span><span class="n">pl</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">pl</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="mi">2</span><span class="p">))</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Mean Squared Error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">meanSquaredError</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Mean Squared Error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">meanSquaredError</span><span class="p">))</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/myIsotonicRegressionModel&quot;</span><span class="p">)</span>
-<span class="n">sameModel</span> <span class="o">=</span> <span class="n">IsotonicRegressionModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/tmp/myIsotonicRegressionModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/myIsotonicRegressionModel&quot;</span><span class="p">)</span>
+<span class="n">sameModel</span> <span class="o">=</span> <span class="n">IsotonicRegressionModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/tmp/myIsotonicRegressionModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/isotonic_regression_example.py" in the Spark repo.</small></div>
   </div>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[20/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/ml-features.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/ml-features.html b/site/docs/2.1.0/ml-features.html
index 64463de..a2f102b 100644
--- a/site/docs/2.1.0/ml-features.html
+++ b/site/docs/2.1.0/ml-features.html
@@ -318,52 +318,52 @@
 <p><strong>Table of Contents</strong></p>
 
 <ul id="markdown-toc">
-  <li><a href="#feature-extractors" id="markdown-toc-feature-extractors">Feature Extractors</a>    <ul>
-      <li><a href="#tf-idf" id="markdown-toc-tf-idf">TF-IDF</a></li>
-      <li><a href="#word2vec" id="markdown-toc-word2vec">Word2Vec</a></li>
-      <li><a href="#countvectorizer" id="markdown-toc-countvectorizer">CountVectorizer</a></li>
+  <li><a href="#feature-extractors">Feature Extractors</a>    <ul>
+      <li><a href="#tf-idf">TF-IDF</a></li>
+      <li><a href="#word2vec">Word2Vec</a></li>
+      <li><a href="#countvectorizer">CountVectorizer</a></li>
     </ul>
   </li>
-  <li><a href="#feature-transformers" id="markdown-toc-feature-transformers">Feature Transformers</a>    <ul>
-      <li><a href="#tokenizer" id="markdown-toc-tokenizer">Tokenizer</a></li>
-      <li><a href="#stopwordsremover" id="markdown-toc-stopwordsremover">StopWordsRemover</a></li>
-      <li><a href="#n-gram" id="markdown-toc-n-gram">$n$-gram</a></li>
-      <li><a href="#binarizer" id="markdown-toc-binarizer">Binarizer</a></li>
-      <li><a href="#pca" id="markdown-toc-pca">PCA</a></li>
-      <li><a href="#polynomialexpansion" id="markdown-toc-polynomialexpansion">PolynomialExpansion</a></li>
-      <li><a href="#discrete-cosine-transform-dct" id="markdown-toc-discrete-cosine-transform-dct">Discrete Cosine Transform (DCT)</a></li>
-      <li><a href="#stringindexer" id="markdown-toc-stringindexer">StringIndexer</a></li>
-      <li><a href="#indextostring" id="markdown-toc-indextostring">IndexToString</a></li>
-      <li><a href="#onehotencoder" id="markdown-toc-onehotencoder">OneHotEncoder</a></li>
-      <li><a href="#vectorindexer" id="markdown-toc-vectorindexer">VectorIndexer</a></li>
-      <li><a href="#interaction" id="markdown-toc-interaction">Interaction</a></li>
-      <li><a href="#normalizer" id="markdown-toc-normalizer">Normalizer</a></li>
-      <li><a href="#standardscaler" id="markdown-toc-standardscaler">StandardScaler</a></li>
-      <li><a href="#minmaxscaler" id="markdown-toc-minmaxscaler">MinMaxScaler</a></li>
-      <li><a href="#maxabsscaler" id="markdown-toc-maxabsscaler">MaxAbsScaler</a></li>
-      <li><a href="#bucketizer" id="markdown-toc-bucketizer">Bucketizer</a></li>
-      <li><a href="#elementwiseproduct" id="markdown-toc-elementwiseproduct">ElementwiseProduct</a></li>
-      <li><a href="#sqltransformer" id="markdown-toc-sqltransformer">SQLTransformer</a></li>
-      <li><a href="#vectorassembler" id="markdown-toc-vectorassembler">VectorAssembler</a></li>
-      <li><a href="#quantilediscretizer" id="markdown-toc-quantilediscretizer">QuantileDiscretizer</a></li>
+  <li><a href="#feature-transformers">Feature Transformers</a>    <ul>
+      <li><a href="#tokenizer">Tokenizer</a></li>
+      <li><a href="#stopwordsremover">StopWordsRemover</a></li>
+      <li><a href="#n-gram">$n$-gram</a></li>
+      <li><a href="#binarizer">Binarizer</a></li>
+      <li><a href="#pca">PCA</a></li>
+      <li><a href="#polynomialexpansion">PolynomialExpansion</a></li>
+      <li><a href="#discrete-cosine-transform-dct">Discrete Cosine Transform (DCT)</a></li>
+      <li><a href="#stringindexer">StringIndexer</a></li>
+      <li><a href="#indextostring">IndexToString</a></li>
+      <li><a href="#onehotencoder">OneHotEncoder</a></li>
+      <li><a href="#vectorindexer">VectorIndexer</a></li>
+      <li><a href="#interaction">Interaction</a></li>
+      <li><a href="#normalizer">Normalizer</a></li>
+      <li><a href="#standardscaler">StandardScaler</a></li>
+      <li><a href="#minmaxscaler">MinMaxScaler</a></li>
+      <li><a href="#maxabsscaler">MaxAbsScaler</a></li>
+      <li><a href="#bucketizer">Bucketizer</a></li>
+      <li><a href="#elementwiseproduct">ElementwiseProduct</a></li>
+      <li><a href="#sqltransformer">SQLTransformer</a></li>
+      <li><a href="#vectorassembler">VectorAssembler</a></li>
+      <li><a href="#quantilediscretizer">QuantileDiscretizer</a></li>
     </ul>
   </li>
-  <li><a href="#feature-selectors" id="markdown-toc-feature-selectors">Feature Selectors</a>    <ul>
-      <li><a href="#vectorslicer" id="markdown-toc-vectorslicer">VectorSlicer</a></li>
-      <li><a href="#rformula" id="markdown-toc-rformula">RFormula</a></li>
-      <li><a href="#chisqselector" id="markdown-toc-chisqselector">ChiSqSelector</a></li>
+  <li><a href="#feature-selectors">Feature Selectors</a>    <ul>
+      <li><a href="#vectorslicer">VectorSlicer</a></li>
+      <li><a href="#rformula">RFormula</a></li>
+      <li><a href="#chisqselector">ChiSqSelector</a></li>
     </ul>
   </li>
-  <li><a href="#locality-sensitive-hashing" id="markdown-toc-locality-sensitive-hashing">Locality Sensitive Hashing</a>    <ul>
-      <li><a href="#lsh-operations" id="markdown-toc-lsh-operations">LSH Operations</a>        <ul>
-          <li><a href="#feature-transformation" id="markdown-toc-feature-transformation">Feature Transformation</a></li>
-          <li><a href="#approximate-similarity-join" id="markdown-toc-approximate-similarity-join">Approximate Similarity Join</a></li>
-          <li><a href="#approximate-nearest-neighbor-search" id="markdown-toc-approximate-nearest-neighbor-search">Approximate Nearest Neighbor Search</a></li>
+  <li><a href="#locality-sensitive-hashing">Locality Sensitive Hashing</a>    <ul>
+      <li><a href="#lsh-operations">LSH Operations</a>        <ul>
+          <li><a href="#feature-transformation">Feature Transformation</a></li>
+          <li><a href="#approximate-similarity-join">Approximate Similarity Join</a></li>
+          <li><a href="#approximate-nearest-neighbor-search">Approximate Nearest Neighbor Search</a></li>
         </ul>
       </li>
-      <li><a href="#lsh-algorithms" id="markdown-toc-lsh-algorithms">LSH Algorithms</a>        <ul>
-          <li><a href="#bucketed-random-projection-for-euclidean-distance" id="markdown-toc-bucketed-random-projection-for-euclidean-distance">Bucketed Random Projection for Euclidean Distance</a></li>
-          <li><a href="#minhash-for-jaccard-distance" id="markdown-toc-minhash-for-jaccard-distance">MinHash for Jaccard Distance</a></li>
+      <li><a href="#lsh-algorithms">LSH Algorithms</a>        <ul>
+          <li><a href="#bucketed-random-projection-for-euclidean-distance">Bucketed Random Projection for Euclidean Distance</a></li>
+          <li><a href="#minhash-for-jaccard-distance">MinHash for Jaccard Distance</a></li>
         </ul>
       </li>
     </ul>
@@ -395,7 +395,7 @@ TFIDF(t, d, D) = TF(t, d) \cdot IDF(t, D).
 There are several variants on the definition of term frequency and document frequency.
 In MLlib, we separate TF and IDF to make them flexible.</p>
 
-<p><strong>TF</strong>: Both <code>HashingTF</code> and <code>CountVectorizer</code> can be used to generate the term frequency vectors.</p>
+<p><strong>TF</strong>: Both <code>HashingTF</code> and <code>CountVectorizer</code> can be used to generate the term frequency vectors. </p>
 
 <p><code>HashingTF</code> is a <code>Transformer</code> which takes sets of terms and converts those sets into 
 fixed-length feature vectors.  In text processing, a &#8220;set of terms&#8221; might be a bag of words.
@@ -437,7 +437,7 @@ when using text as features.  Our feature vectors could then be passed to a lear
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.ml.feature.HashingTF">HashingTF Scala docs</a> and
 the <a href="api/scala/index.html#org.apache.spark.ml.feature.IDF">IDF Scala docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.</span><span class="o">{</span><span class="nc">HashingTF</span><span class="o">,</span> <span class="nc">IDF</span><span class="o">,</span> <span class="nc">Tokenizer</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.</span><span class="o">{</span><span class="nc">HashingTF</span><span class="o">,</span> <span class="nc">IDF</span><span class="o">,</span> <span class="nc">Tokenizer</span><span class="o">}</span>
 
 <span class="k">val</span> <span class="n">sentenceData</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="o">(</span><span class="nc">Seq</span><span class="o">(</span>
   <span class="o">(</span><span class="mf">0.0</span><span class="o">,</span> <span class="s">&quot;Hi I heard about Spark&quot;</span><span class="o">),</span>
@@ -468,7 +468,7 @@ the <a href="api/scala/index.html#org.apache.spark.ml.feature.IDF">IDF Scala doc
     <p>Refer to the <a href="api/java/org/apache/spark/ml/feature/HashingTF.html">HashingTF Java docs</a> and the
 <a href="api/java/org/apache/spark/ml/feature/IDF.html">IDF Java docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.List</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.feature.HashingTF</span><span class="o">;</span>
@@ -489,17 +489,17 @@ the <a href="api/scala/index.html#org.apache.spark.ml.feature.IDF">IDF Scala doc
   <span class="n">RowFactory</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="mf">0.0</span><span class="o">,</span> <span class="s">&quot;I wish Java could use case classes&quot;</span><span class="o">),</span>
   <span class="n">RowFactory</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="s">&quot;Logistic regression models are neat&quot;</span><span class="o">)</span>
 <span class="o">);</span>
-<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span><span class="o">[]{</span>
-  <span class="k">new</span> <span class="nf">StructField</span><span class="o">(</span><span class="s">&quot;label&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">DoubleType</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">()),</span>
-  <span class="k">new</span> <span class="nf">StructField</span><span class="o">(</span><span class="s">&quot;sentence&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">StringType</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">())</span>
+<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span><span class="o">[]{</span>
+  <span class="k">new</span> <span class="n">StructField</span><span class="o">(</span><span class="s">&quot;label&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">DoubleType</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">()),</span>
+  <span class="k">new</span> <span class="n">StructField</span><span class="o">(</span><span class="s">&quot;sentence&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">StringType</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">())</span>
 <span class="o">});</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">sentenceData</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">createDataFrame</span><span class="o">(</span><span class="n">data</span><span class="o">,</span> <span class="n">schema</span><span class="o">);</span>
 
-<span class="n">Tokenizer</span> <span class="n">tokenizer</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Tokenizer</span><span class="o">().</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;sentence&quot;</span><span class="o">).</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;words&quot;</span><span class="o">);</span>
+<span class="n">Tokenizer</span> <span class="n">tokenizer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Tokenizer</span><span class="o">().</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;sentence&quot;</span><span class="o">).</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;words&quot;</span><span class="o">);</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">wordsData</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="na">transform</span><span class="o">(</span><span class="n">sentenceData</span><span class="o">);</span>
 
 <span class="kt">int</span> <span class="n">numFeatures</span> <span class="o">=</span> <span class="mi">20</span><span class="o">;</span>
-<span class="n">HashingTF</span> <span class="n">hashingTF</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">HashingTF</span><span class="o">()</span>
+<span class="n">HashingTF</span> <span class="n">hashingTF</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashingTF</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;words&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;rawFeatures&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setNumFeatures</span><span class="o">(</span><span class="n">numFeatures</span><span class="o">);</span>
@@ -507,7 +507,7 @@ the <a href="api/scala/index.html#org.apache.spark.ml.feature.IDF">IDF Scala doc
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">featurizedData</span> <span class="o">=</span> <span class="n">hashingTF</span><span class="o">.</span><span class="na">transform</span><span class="o">(</span><span class="n">wordsData</span><span class="o">);</span>
 <span class="c1">// alternatively, CountVectorizer can also be used to get term frequency vectors</span>
 
-<span class="n">IDF</span> <span class="n">idf</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">IDF</span><span class="o">().</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;rawFeatures&quot;</span><span class="o">).</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;features&quot;</span><span class="o">);</span>
+<span class="n">IDF</span> <span class="n">idf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">IDF</span><span class="o">().</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;rawFeatures&quot;</span><span class="o">).</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;features&quot;</span><span class="o">);</span>
 <span class="n">IDFModel</span> <span class="n">idfModel</span> <span class="o">=</span> <span class="n">idf</span><span class="o">.</span><span class="na">fit</span><span class="o">(</span><span class="n">featurizedData</span><span class="o">);</span>
 
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">rescaledData</span> <span class="o">=</span> <span class="n">idfModel</span><span class="o">.</span><span class="na">transform</span><span class="o">(</span><span class="n">featurizedData</span><span class="o">);</span>
@@ -521,26 +521,26 @@ the <a href="api/scala/index.html#org.apache.spark.ml.feature.IDF">IDF Scala doc
     <p>Refer to the <a href="api/python/pyspark.ml.html#pyspark.ml.feature.HashingTF">HashingTF Python docs</a> and
 the <a href="api/python/pyspark.ml.html#pyspark.ml.feature.IDF">IDF Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">HashingTF</span><span class="p">,</span> <span class="n">IDF</span><span class="p">,</span> <span class="n">Tokenizer</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">HashingTF</span><span class="p">,</span> <span class="n">IDF</span><span class="p">,</span> <span class="n">Tokenizer</span>
 
 <span class="n">sentenceData</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([</span>
-    <span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="s">&quot;Hi I heard about Spark&quot;</span><span class="p">),</span>
-    <span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="s">&quot;I wish Java could use case classes&quot;</span><span class="p">),</span>
-    <span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="s">&quot;Logistic regression models are neat&quot;</span><span class="p">)</span>
-<span class="p">],</span> <span class="p">[</span><span class="s">&quot;label&quot;</span><span class="p">,</span> <span class="s">&quot;sentence&quot;</span><span class="p">])</span>
+    <span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="s2">&quot;Hi I heard about Spark&quot;</span><span class="p">),</span>
+    <span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="s2">&quot;I wish Java could use case classes&quot;</span><span class="p">),</span>
+    <span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="s2">&quot;Logistic regression models are neat&quot;</span><span class="p">)</span>
+<span class="p">],</span> <span class="p">[</span><span class="s2">&quot;label&quot;</span><span class="p">,</span> <span class="s2">&quot;sentence&quot;</span><span class="p">])</span>
 
-<span class="n">tokenizer</span> <span class="o">=</span> <span class="n">Tokenizer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;sentence&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;words&quot;</span><span class="p">)</span>
+<span class="n">tokenizer</span> <span class="o">=</span> <span class="n">Tokenizer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;sentence&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;words&quot;</span><span class="p">)</span>
 <span class="n">wordsData</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">sentenceData</span><span class="p">)</span>
 
-<span class="n">hashingTF</span> <span class="o">=</span> <span class="n">HashingTF</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;words&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;rawFeatures&quot;</span><span class="p">,</span> <span class="n">numFeatures</span><span class="o">=</span><span class="mi">20</span><span class="p">)</span>
+<span class="n">hashingTF</span> <span class="o">=</span> <span class="n">HashingTF</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;words&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;rawFeatures&quot;</span><span class="p">,</span> <span class="n">numFeatures</span><span class="o">=</span><span class="mi">20</span><span class="p">)</span>
 <span class="n">featurizedData</span> <span class="o">=</span> <span class="n">hashingTF</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">wordsData</span><span class="p">)</span>
-<span class="c"># alternatively, CountVectorizer can also be used to get term frequency vectors</span>
+<span class="c1"># alternatively, CountVectorizer can also be used to get term frequency vectors</span>
 
-<span class="n">idf</span> <span class="o">=</span> <span class="n">IDF</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;rawFeatures&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;features&quot;</span><span class="p">)</span>
+<span class="n">idf</span> <span class="o">=</span> <span class="n">IDF</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;rawFeatures&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;features&quot;</span><span class="p">)</span>
 <span class="n">idfModel</span> <span class="o">=</span> <span class="n">idf</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">featurizedData</span><span class="p">)</span>
 <span class="n">rescaledData</span> <span class="o">=</span> <span class="n">idfModel</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">featurizedData</span><span class="p">)</span>
 
-<span class="n">rescaledData</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&quot;label&quot;</span><span class="p">,</span> <span class="s">&quot;features&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
+<span class="n">rescaledData</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">&quot;label&quot;</span><span class="p">,</span> <span class="s2">&quot;features&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/ml/tf_idf_example.py" in the Spark repo.</small></div>
   </div>
@@ -563,7 +563,7 @@ details.</p>
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.ml.feature.Word2Vec">Word2Vec Scala docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.Word2Vec</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.Word2Vec</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.linalg.Vector</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.sql.Row</span>
 
@@ -584,7 +584,7 @@ for more details on the API.</p>
 
 <span class="k">val</span> <span class="n">result</span> <span class="k">=</span> <span class="n">model</span><span class="o">.</span><span class="n">transform</span><span class="o">(</span><span class="n">documentDF</span><span class="o">)</span>
 <span class="n">result</span><span class="o">.</span><span class="n">collect</span><span class="o">().</span><span class="n">foreach</span> <span class="o">{</span> <span class="k">case</span> <span class="nc">Row</span><span class="o">(</span><span class="n">text</span><span class="k">:</span> <span class="kt">Seq</span><span class="o">[</span><span class="k">_</span><span class="o">],</span> <span class="n">features</span><span class="k">:</span> <span class="kt">Vector</span><span class="o">)</span> <span class="k">=&gt;</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Text: [${text.mkString(&quot;</span><span class="o">,</span> <span class="s">&quot;)}] =&gt; \nVector: $features\n&quot;</span><span class="o">)</span> <span class="o">}</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;Text: [</span><span class="si">${</span><span class="n">text</span><span class="o">.</span><span class="n">mkString</span><span class="o">(</span><span class="s">&quot;, &quot;</span><span class="o">)</span><span class="si">}</span><span class="s">] =&gt; \nVector: </span><span class="si">$features</span><span class="s">\n&quot;</span><span class="o">)</span> <span class="o">}</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/ml/Word2VecExample.scala" in the Spark repo.</small></div>
   </div>
@@ -594,7 +594,7 @@ for more details on the API.</p>
     <p>Refer to the <a href="api/java/org/apache/spark/ml/feature/Word2Vec.html">Word2Vec Java docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.List</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.feature.Word2Vec</span><span class="o">;</span>
@@ -612,13 +612,13 @@ for more details on the API.</p>
   <span class="n">RowFactory</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="s">&quot;I wish Java could use case classes&quot;</span><span class="o">.</span><span class="na">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">))),</span>
   <span class="n">RowFactory</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="s">&quot;Logistic regression models are neat&quot;</span><span class="o">.</span><span class="na">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">)))</span>
 <span class="o">);</span>
-<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span><span class="o">[]{</span>
-  <span class="k">new</span> <span class="nf">StructField</span><span class="o">(</span><span class="s">&quot;text&quot;</span><span class="o">,</span> <span class="k">new</span> <span class="nf">ArrayType</span><span class="o">(</span><span class="n">DataTypes</span><span class="o">.</span><span class="na">StringType</span><span class="o">,</span> <span class="kc">true</span><span class="o">),</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">())</span>
+<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span><span class="o">[]{</span>
+  <span class="k">new</span> <span class="n">StructField</span><span class="o">(</span><span class="s">&quot;text&quot;</span><span class="o">,</span> <span class="k">new</span> <span class="n">ArrayType</span><span class="o">(</span><span class="n">DataTypes</span><span class="o">.</span><span class="na">StringType</span><span class="o">,</span> <span class="kc">true</span><span class="o">),</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">())</span>
 <span class="o">});</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">documentDF</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">createDataFrame</span><span class="o">(</span><span class="n">data</span><span class="o">,</span> <span class="n">schema</span><span class="o">);</span>
 
 <span class="c1">// Learn a mapping from words to Vectors.</span>
-<span class="n">Word2Vec</span> <span class="n">word2Vec</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Word2Vec</span><span class="o">()</span>
+<span class="n">Word2Vec</span> <span class="n">word2Vec</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Word2Vec</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;text&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;result&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setVectorSize</span><span class="o">(</span><span class="mi">3</span><span class="o">)</span>
@@ -641,23 +641,23 @@ for more details on the API.</p>
     <p>Refer to the <a href="api/python/pyspark.ml.html#pyspark.ml.feature.Word2Vec">Word2Vec Python docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">Word2Vec</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">Word2Vec</span>
 
-<span class="c"># Input data: Each row is a bag of words from a sentence or document.</span>
+<span class="c1"># Input data: Each row is a bag of words from a sentence or document.</span>
 <span class="n">documentDF</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([</span>
-    <span class="p">(</span><span class="s">&quot;Hi I heard about Spark&quot;</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">),</span> <span class="p">),</span>
-    <span class="p">(</span><span class="s">&quot;I wish Java could use case classes&quot;</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">),</span> <span class="p">),</span>
-    <span class="p">(</span><span class="s">&quot;Logistic regression models are neat&quot;</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">),</span> <span class="p">)</span>
-<span class="p">],</span> <span class="p">[</span><span class="s">&quot;text&quot;</span><span class="p">])</span>
+    <span class="p">(</span><span class="s2">&quot;Hi I heard about Spark&quot;</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">),</span> <span class="p">),</span>
+    <span class="p">(</span><span class="s2">&quot;I wish Java could use case classes&quot;</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">),</span> <span class="p">),</span>
+    <span class="p">(</span><span class="s2">&quot;Logistic regression models are neat&quot;</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">),</span> <span class="p">)</span>
+<span class="p">],</span> <span class="p">[</span><span class="s2">&quot;text&quot;</span><span class="p">])</span>
 
-<span class="c"># Learn a mapping from words to Vectors.</span>
-<span class="n">word2Vec</span> <span class="o">=</span> <span class="n">Word2Vec</span><span class="p">(</span><span class="n">vectorSize</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">minCount</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;text&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;result&quot;</span><span class="p">)</span>
+<span class="c1"># Learn a mapping from words to Vectors.</span>
+<span class="n">word2Vec</span> <span class="o">=</span> <span class="n">Word2Vec</span><span class="p">(</span><span class="n">vectorSize</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">minCount</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;text&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;result&quot;</span><span class="p">)</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">word2Vec</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">documentDF</span><span class="p">)</span>
 
 <span class="n">result</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">documentDF</span><span class="p">)</span>
 <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">result</span><span class="o">.</span><span class="n">collect</span><span class="p">():</span>
     <span class="n">text</span><span class="p">,</span> <span class="n">vector</span> <span class="o">=</span> <span class="n">row</span>
-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Text: [</span><span class="si">%s</span><span class="s">] =&gt; </span><span class="se">\n</span><span class="s">Vector: </span><span class="si">%s</span><span class="se">\n</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="s">&quot;, &quot;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">text</span><span class="p">),</span> <span class="nb">str</span><span class="p">(</span><span class="n">vector</span><span class="p">)))</span>
+    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;Text: [</span><span class="si">%s</span><span class="s2">] =&gt; </span><span class="se">\n</span><span class="s2">Vector: </span><span class="si">%s</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="s2">&quot;, &quot;</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">text</span><span class="p">),</span> <span class="nb">str</span><span class="p">(</span><span class="n">vector</span><span class="p">)))</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/ml/word2vec_example.py" in the Spark repo.</small></div>
   </div>
@@ -707,7 +707,7 @@ Then the output column &#8220;vector&#8221; after transformation contains:</p>
 and the <a href="api/scala/index.html#org.apache.spark.ml.feature.CountVectorizerModel">CountVectorizerModel Scala docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.</span><span class="o">{</span><span class="nc">CountVectorizer</span><span class="o">,</span> <span class="nc">CountVectorizerModel</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.</span><span class="o">{</span><span class="nc">CountVectorizer</span><span class="o">,</span> <span class="nc">CountVectorizerModel</span><span class="o">}</span>
 
 <span class="k">val</span> <span class="n">df</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="o">(</span><span class="nc">Seq</span><span class="o">(</span>
   <span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="nc">Array</span><span class="o">(</span><span class="s">&quot;a&quot;</span><span class="o">,</span> <span class="s">&quot;b&quot;</span><span class="o">,</span> <span class="s">&quot;c&quot;</span><span class="o">)),</span>
@@ -738,7 +738,7 @@ for more details on the API.</p>
 and the <a href="api/java/org/apache/spark/ml/feature/CountVectorizerModel.html">CountVectorizerModel Java docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.List</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.feature.CountVectorizer</span><span class="o">;</span>
@@ -754,13 +754,13 @@ for more details on the API.</p>
   <span class="n">RowFactory</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="s">&quot;a&quot;</span><span class="o">,</span> <span class="s">&quot;b&quot;</span><span class="o">,</span> <span class="s">&quot;c&quot;</span><span class="o">)),</span>
   <span class="n">RowFactory</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="s">&quot;a&quot;</span><span class="o">,</span> <span class="s">&quot;b&quot;</span><span class="o">,</span> <span class="s">&quot;b&quot;</span><span class="o">,</span> <span class="s">&quot;c&quot;</span><span class="o">,</span> <span class="s">&quot;a&quot;</span><span class="o">))</span>
 <span class="o">);</span>
-<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span> <span class="o">[]</span> <span class="o">{</span>
-  <span class="k">new</span> <span class="nf">StructField</span><span class="o">(</span><span class="s">&quot;text&quot;</span><span class="o">,</span> <span class="k">new</span> <span class="nf">ArrayType</span><span class="o">(</span><span class="n">DataTypes</span><span class="o">.</span><span class="na">StringType</span><span class="o">,</span> <span class="kc">true</span><span class="o">),</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">())</span>
+<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span> <span class="o">[]</span> <span class="o">{</span>
+  <span class="k">new</span> <span class="n">StructField</span><span class="o">(</span><span class="s">&quot;text&quot;</span><span class="o">,</span> <span class="k">new</span> <span class="n">ArrayType</span><span class="o">(</span><span class="n">DataTypes</span><span class="o">.</span><span class="na">StringType</span><span class="o">,</span> <span class="kc">true</span><span class="o">),</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">())</span>
 <span class="o">});</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">createDataFrame</span><span class="o">(</span><span class="n">data</span><span class="o">,</span> <span class="n">schema</span><span class="o">);</span>
 
 <span class="c1">// fit a CountVectorizerModel from the corpus</span>
-<span class="n">CountVectorizerModel</span> <span class="n">cvModel</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">CountVectorizer</span><span class="o">()</span>
+<span class="n">CountVectorizerModel</span> <span class="n">cvModel</span> <span class="o">=</span> <span class="k">new</span> <span class="n">CountVectorizer</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;text&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;feature&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setVocabSize</span><span class="o">(</span><span class="mi">3</span><span class="o">)</span>
@@ -768,7 +768,7 @@ for more details on the API.</p>
   <span class="o">.</span><span class="na">fit</span><span class="o">(</span><span class="n">df</span><span class="o">);</span>
 
 <span class="c1">// alternatively, define CountVectorizerModel with a-priori vocabulary</span>
-<span class="n">CountVectorizerModel</span> <span class="n">cvm</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">CountVectorizerModel</span><span class="o">(</span><span class="k">new</span> <span class="n">String</span><span class="o">[]{</span><span class="s">&quot;a&quot;</span><span class="o">,</span> <span class="s">&quot;b&quot;</span><span class="o">,</span> <span class="s">&quot;c&quot;</span><span class="o">})</span>
+<span class="n">CountVectorizerModel</span> <span class="n">cvm</span> <span class="o">=</span> <span class="k">new</span> <span class="n">CountVectorizerModel</span><span class="o">(</span><span class="k">new</span> <span class="n">String</span><span class="o">[]{</span><span class="s">&quot;a&quot;</span><span class="o">,</span> <span class="s">&quot;b&quot;</span><span class="o">,</span> <span class="s">&quot;c&quot;</span><span class="o">})</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;text&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;feature&quot;</span><span class="o">);</span>
 
@@ -783,16 +783,16 @@ for more details on the API.</p>
 and the <a href="api/python/pyspark.ml.html#pyspark.ml.feature.CountVectorizerModel">CountVectorizerModel Python docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">CountVectorizer</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">CountVectorizer</span>
 
-<span class="c"># Input data: Each row is a bag of words with a ID.</span>
+<span class="c1"># Input data: Each row is a bag of words with a ID.</span>
 <span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([</span>
-    <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="s">&quot;a b c&quot;</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">)),</span>
-    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s">&quot;a b b c a&quot;</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">))</span>
-<span class="p">],</span> <span class="p">[</span><span class="s">&quot;id&quot;</span><span class="p">,</span> <span class="s">&quot;words&quot;</span><span class="p">])</span>
+    <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="s2">&quot;a b c&quot;</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">)),</span>
+    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s2">&quot;a b b c a&quot;</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">))</span>
+<span class="p">],</span> <span class="p">[</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="s2">&quot;words&quot;</span><span class="p">])</span>
 
-<span class="c"># fit a CountVectorizerModel from the corpus.</span>
-<span class="n">cv</span> <span class="o">=</span> <span class="n">CountVectorizer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;words&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;features&quot;</span><span class="p">,</span> <span class="n">vocabSize</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">minDF</span><span class="o">=</span><span class="mf">2.0</span><span class="p">)</span>
+<span class="c1"># fit a CountVectorizerModel from the corpus.</span>
+<span class="n">cv</span> <span class="o">=</span> <span class="n">CountVectorizer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;words&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;features&quot;</span><span class="p">,</span> <span class="n">vocabSize</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">minDF</span><span class="o">=</span><span class="mf">2.0</span><span class="p">)</span>
 
 <span class="n">model</span> <span class="o">=</span> <span class="n">cv</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">df</span><span class="p">)</span>
 
@@ -822,7 +822,7 @@ for more details on the API.</p>
 and the <a href="api/scala/index.html#org.apache.spark.ml.feature.RegexTokenizer">RegexTokenizer Scala docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.</span><span class="o">{</span><span class="nc">RegexTokenizer</span><span class="o">,</span> <span class="nc">Tokenizer</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.</span><span class="o">{</span><span class="nc">RegexTokenizer</span><span class="o">,</span> <span class="nc">Tokenizer</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.sql.functions._</span>
 
 <span class="k">val</span> <span class="n">sentenceDataFrame</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="o">(</span><span class="nc">Seq</span><span class="o">(</span>
@@ -856,7 +856,7 @@ for more details on the API.</p>
 and the <a href="api/java/org/apache/spark/ml/feature/RegexTokenizer.html">RegexTokenizer Java docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.List</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">scala.collection.mutable.WrappedArray</span><span class="o">;</span>
@@ -878,16 +878,16 @@ for more details on the API.</p>
   <span class="n">RowFactory</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="s">&quot;Logistic,regression,models,are,neat&quot;</span><span class="o">)</span>
 <span class="o">);</span>
 
-<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span><span class="o">[]{</span>
-  <span class="k">new</span> <span class="nf">StructField</span><span class="o">(</span><span class="s">&quot;id&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">IntegerType</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">()),</span>
-  <span class="k">new</span> <span class="nf">StructField</span><span class="o">(</span><span class="s">&quot;sentence&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">StringType</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">())</span>
+<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span><span class="o">[]{</span>
+  <span class="k">new</span> <span class="n">StructField</span><span class="o">(</span><span class="s">&quot;id&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">IntegerType</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">()),</span>
+  <span class="k">new</span> <span class="n">StructField</span><span class="o">(</span><span class="s">&quot;sentence&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">StringType</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">())</span>
 <span class="o">});</span>
 
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">sentenceDataFrame</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">createDataFrame</span><span class="o">(</span><span class="n">data</span><span class="o">,</span> <span class="n">schema</span><span class="o">);</span>
 
-<span class="n">Tokenizer</span> <span class="n">tokenizer</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Tokenizer</span><span class="o">().</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;sentence&quot;</span><span class="o">).</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;words&quot;</span><span class="o">);</span>
+<span class="n">Tokenizer</span> <span class="n">tokenizer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Tokenizer</span><span class="o">().</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;sentence&quot;</span><span class="o">).</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;words&quot;</span><span class="o">);</span>
 
-<span class="n">RegexTokenizer</span> <span class="n">regexTokenizer</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">RegexTokenizer</span><span class="o">()</span>
+<span class="n">RegexTokenizer</span> <span class="n">regexTokenizer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RegexTokenizer</span><span class="o">()</span>
     <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;sentence&quot;</span><span class="o">)</span>
     <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;words&quot;</span><span class="o">)</span>
     <span class="o">.</span><span class="na">setPattern</span><span class="o">(</span><span class="s">&quot;\\W&quot;</span><span class="o">);</span>  <span class="c1">// alternatively .setPattern(&quot;\\w+&quot;).setGaps(false);</span>
@@ -916,30 +916,30 @@ for more details on the API.</p>
 the <a href="api/python/pyspark.ml.html#pyspark.ml.feature.RegexTokenizer">RegexTokenizer Python docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">Tokenizer</span><span class="p">,</span> <span class="n">RegexTokenizer</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">Tokenizer</span><span class="p">,</span> <span class="n">RegexTokenizer</span>
 <span class="kn">from</span> <span class="nn">pyspark.sql.functions</span> <span class="kn">import</span> <span class="n">col</span><span class="p">,</span> <span class="n">udf</span>
 <span class="kn">from</span> <span class="nn">pyspark.sql.types</span> <span class="kn">import</span> <span class="n">IntegerType</span>
 
 <span class="n">sentenceDataFrame</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([</span>
-    <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="s">&quot;Hi I heard about Spark&quot;</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s">&quot;I wish Java could use case classes&quot;</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s">&quot;Logistic,regression,models,are,neat&quot;</span><span class="p">)</span>
-<span class="p">],</span> <span class="p">[</span><span class="s">&quot;id&quot;</span><span class="p">,</span> <span class="s">&quot;sentence&quot;</span><span class="p">])</span>
+    <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="s2">&quot;Hi I heard about Spark&quot;</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s2">&quot;I wish Java could use case classes&quot;</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s2">&quot;Logistic,regression,models,are,neat&quot;</span><span class="p">)</span>
+<span class="p">],</span> <span class="p">[</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="s2">&quot;sentence&quot;</span><span class="p">])</span>
 
-<span class="n">tokenizer</span> <span class="o">=</span> <span class="n">Tokenizer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;sentence&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;words&quot;</span><span class="p">)</span>
+<span class="n">tokenizer</span> <span class="o">=</span> <span class="n">Tokenizer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;sentence&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;words&quot;</span><span class="p">)</span>
 
-<span class="n">regexTokenizer</span> <span class="o">=</span> <span class="n">RegexTokenizer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;sentence&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;words&quot;</span><span class="p">,</span> <span class="n">pattern</span><span class="o">=</span><span class="s">&quot;</span><span class="se">\\</span><span class="s">W&quot;</span><span class="p">)</span>
-<span class="c"># alternatively, pattern=&quot;\\w+&quot;, gaps(False)</span>
+<span class="n">regexTokenizer</span> <span class="o">=</span> <span class="n">RegexTokenizer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;sentence&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;words&quot;</span><span class="p">,</span> <span class="n">pattern</span><span class="o">=</span><span class="s2">&quot;</span><span class="se">\\</span><span class="s2">W&quot;</span><span class="p">)</span>
+<span class="c1"># alternatively, pattern=&quot;\\w+&quot;, gaps(False)</span>
 
 <span class="n">countTokens</span> <span class="o">=</span> <span class="n">udf</span><span class="p">(</span><span class="k">lambda</span> <span class="n">words</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">words</span><span class="p">),</span> <span class="n">IntegerType</span><span class="p">())</span>
 
 <span class="n">tokenized</span> <span class="o">=</span> <span class="n">tokenizer</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">sentenceDataFrame</span><span class="p">)</span>
-<span class="n">tokenized</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&quot;sentence&quot;</span><span class="p">,</span> <span class="s">&quot;words&quot;</span><span class="p">)</span>\
-    <span class="o">.</span><span class="n">withColumn</span><span class="p">(</span><span class="s">&quot;tokens&quot;</span><span class="p">,</span> <span class="n">countTokens</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s">&quot;words&quot;</span><span class="p">)))</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">truncate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
+<span class="n">tokenized</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">&quot;sentence&quot;</span><span class="p">,</span> <span class="s2">&quot;words&quot;</span><span class="p">)</span>\
+    <span class="o">.</span><span class="n">withColumn</span><span class="p">(</span><span class="s2">&quot;tokens&quot;</span><span class="p">,</span> <span class="n">countTokens</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s2">&quot;words&quot;</span><span class="p">)))</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">truncate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
 
 <span class="n">regexTokenized</span> <span class="o">=</span> <span class="n">regexTokenizer</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">sentenceDataFrame</span><span class="p">)</span>
-<span class="n">regexTokenized</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&quot;sentence&quot;</span><span class="p">,</span> <span class="s">&quot;words&quot;</span><span class="p">)</span> \
-    <span class="o">.</span><span class="n">withColumn</span><span class="p">(</span><span class="s">&quot;tokens&quot;</span><span class="p">,</span> <span class="n">countTokens</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s">&quot;words&quot;</span><span class="p">)))</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">truncate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
+<span class="n">regexTokenized</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">&quot;sentence&quot;</span><span class="p">,</span> <span class="s2">&quot;words&quot;</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">withColumn</span><span class="p">(</span><span class="s2">&quot;tokens&quot;</span><span class="p">,</span> <span class="n">countTokens</span><span class="p">(</span><span class="n">col</span><span class="p">(</span><span class="s2">&quot;words&quot;</span><span class="p">)))</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">truncate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/ml/tokenizer_example.py" in the Spark repo.</small></div>
   </div>
@@ -989,7 +989,7 @@ filtered out.</p>
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.ml.feature.StopWordsRemover">StopWordsRemover Scala docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.StopWordsRemover</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.StopWordsRemover</span>
 
 <span class="k">val</span> <span class="n">remover</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">StopWordsRemover</span><span class="o">()</span>
   <span class="o">.</span><span class="n">setInputCol</span><span class="o">(</span><span class="s">&quot;raw&quot;</span><span class="o">)</span>
@@ -1010,7 +1010,7 @@ for more details on the API.</p>
     <p>Refer to the <a href="api/java/org/apache/spark/ml/feature/StopWordsRemover.html">StopWordsRemover Java docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.List</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.feature.StopWordsRemover</span><span class="o">;</span>
@@ -1022,7 +1022,7 @@ for more details on the API.</p>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.types.StructField</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.types.StructType</span><span class="o">;</span>
 
-<span class="n">StopWordsRemover</span> <span class="n">remover</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StopWordsRemover</span><span class="o">()</span>
+<span class="n">StopWordsRemover</span> <span class="n">remover</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StopWordsRemover</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;raw&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;filtered&quot;</span><span class="o">);</span>
 
@@ -1031,8 +1031,8 @@ for more details on the API.</p>
   <span class="n">RowFactory</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="s">&quot;Mary&quot;</span><span class="o">,</span> <span class="s">&quot;had&quot;</span><span class="o">,</span> <span class="s">&quot;a&quot;</span><span class="o">,</span> <span class="s">&quot;little&quot;</span><span class="o">,</span> <span class="s">&quot;lamb&quot;</span><span class="o">))</span>
 <span class="o">);</span>
 
-<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span><span class="o">[]{</span>
-  <span class="k">new</span> <span class="nf">StructField</span><span class="o">(</span>
+<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span><span class="o">[]{</span>
+  <span class="k">new</span> <span class="n">StructField</span><span class="o">(</span>
     <span class="s">&quot;raw&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">createArrayType</span><span class="o">(</span><span class="n">DataTypes</span><span class="o">.</span><span class="na">StringType</span><span class="o">),</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">())</span>
 <span class="o">});</span>
 
@@ -1047,14 +1047,14 @@ for more details on the API.</p>
     <p>Refer to the <a href="api/python/pyspark.ml.html#pyspark.ml.feature.StopWordsRemover">StopWordsRemover Python docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">StopWordsRemover</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">StopWordsRemover</span>
 
 <span class="n">sentenceData</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([</span>
-    <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="p">[</span><span class="s">&quot;I&quot;</span><span class="p">,</span> <span class="s">&quot;saw&quot;</span><span class="p">,</span> <span class="s">&quot;the&quot;</span><span class="p">,</span> <span class="s">&quot;red&quot;</span><span class="p">,</span> <span class="s">&quot;balloon&quot;</span><span class="p">]),</span>
-    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="p">[</span><span class="s">&quot;Mary&quot;</span><span class="p">,</span> <span class="s">&quot;had&quot;</span><span class="p">,</span> <span class="s">&quot;a&quot;</span><span class="p">,</span> <span class="s">&quot;little&quot;</span><span class="p">,</span> <span class="s">&quot;lamb&quot;</span><span class="p">])</span>
-<span class="p">],</span> <span class="p">[</span><span class="s">&quot;id&quot;</span><span class="p">,</span> <span class="s">&quot;raw&quot;</span><span class="p">])</span>
+    <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="p">[</span><span class="s2">&quot;I&quot;</span><span class="p">,</span> <span class="s2">&quot;saw&quot;</span><span class="p">,</span> <span class="s2">&quot;the&quot;</span><span class="p">,</span> <span class="s2">&quot;red&quot;</span><span class="p">,</span> <span class="s2">&quot;balloon&quot;</span><span class="p">]),</span>
+    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="p">[</span><span class="s2">&quot;Mary&quot;</span><span class="p">,</span> <span class="s2">&quot;had&quot;</span><span class="p">,</span> <span class="s2">&quot;a&quot;</span><span class="p">,</span> <span class="s2">&quot;little&quot;</span><span class="p">,</span> <span class="s2">&quot;lamb&quot;</span><span class="p">])</span>
+<span class="p">],</span> <span class="p">[</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="s2">&quot;raw&quot;</span><span class="p">])</span>
 
-<span class="n">remover</span> <span class="o">=</span> <span class="n">StopWordsRemover</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;raw&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;filtered&quot;</span><span class="p">)</span>
+<span class="n">remover</span> <span class="o">=</span> <span class="n">StopWordsRemover</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;raw&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;filtered&quot;</span><span class="p">)</span>
 <span class="n">remover</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">sentenceData</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">truncate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/ml/stopwords_remover_example.py" in the Spark repo.</small></div>
@@ -1074,7 +1074,7 @@ for more details on the API.</p>
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.ml.feature.NGram">NGram Scala docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.NGram</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.NGram</span>
 
 <span class="k">val</span> <span class="n">wordDataFrame</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="o">(</span><span class="nc">Seq</span><span class="o">(</span>
   <span class="o">(</span><span class="mi">0</span><span class="o">,</span> <span class="nc">Array</span><span class="o">(</span><span class="s">&quot;Hi&quot;</span><span class="o">,</span> <span class="s">&quot;I&quot;</span><span class="o">,</span> <span class="s">&quot;heard&quot;</span><span class="o">,</span> <span class="s">&quot;about&quot;</span><span class="o">,</span> <span class="s">&quot;Spark&quot;</span><span class="o">)),</span>
@@ -1095,7 +1095,7 @@ for more details on the API.</p>
     <p>Refer to the <a href="api/java/org/apache/spark/ml/feature/NGram.html">NGram Java docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.List</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.feature.NGram</span><span class="o">;</span>
@@ -1112,15 +1112,15 @@ for more details on the API.</p>
   <span class="n">RowFactory</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="s">&quot;Logistic&quot;</span><span class="o">,</span> <span class="s">&quot;regression&quot;</span><span class="o">,</span> <span class="s">&quot;models&quot;</span><span class="o">,</span> <span class="s">&quot;are&quot;</span><span class="o">,</span> <span class="s">&quot;neat&quot;</span><span class="o">))</span>
 <span class="o">);</span>
 
-<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span><span class="o">[]{</span>
-  <span class="k">new</span> <span class="nf">StructField</span><span class="o">(</span><span class="s">&quot;id&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">IntegerType</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">()),</span>
-  <span class="k">new</span> <span class="nf">StructField</span><span class="o">(</span>
+<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span><span class="o">[]{</span>
+  <span class="k">new</span> <span class="n">StructField</span><span class="o">(</span><span class="s">&quot;id&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">IntegerType</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">()),</span>
+  <span class="k">new</span> <span class="n">StructField</span><span class="o">(</span>
     <span class="s">&quot;words&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">createArrayType</span><span class="o">(</span><span class="n">DataTypes</span><span class="o">.</span><span class="na">StringType</span><span class="o">),</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">())</span>
 <span class="o">});</span>
 
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">wordDataFrame</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">createDataFrame</span><span class="o">(</span><span class="n">data</span><span class="o">,</span> <span class="n">schema</span><span class="o">);</span>
 
-<span class="n">NGram</span> <span class="n">ngramTransformer</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">NGram</span><span class="o">().</span><span class="na">setN</span><span class="o">(</span><span class="mi">2</span><span class="o">).</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;words&quot;</span><span class="o">).</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;ngrams&quot;</span><span class="o">);</span>
+<span class="n">NGram</span> <span class="n">ngramTransformer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">NGram</span><span class="o">().</span><span class="na">setN</span><span class="o">(</span><span class="mi">2</span><span class="o">).</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;words&quot;</span><span class="o">).</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;ngrams&quot;</span><span class="o">);</span>
 
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">ngramDataFrame</span> <span class="o">=</span> <span class="n">ngramTransformer</span><span class="o">.</span><span class="na">transform</span><span class="o">(</span><span class="n">wordDataFrame</span><span class="o">);</span>
 <span class="n">ngramDataFrame</span><span class="o">.</span><span class="na">select</span><span class="o">(</span><span class="s">&quot;ngrams&quot;</span><span class="o">).</span><span class="na">show</span><span class="o">(</span><span class="kc">false</span><span class="o">);</span>
@@ -1133,18 +1133,18 @@ for more details on the API.</p>
     <p>Refer to the <a href="api/python/pyspark.ml.html#pyspark.ml.feature.NGram">NGram Python docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">NGram</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">NGram</span>
 
 <span class="n">wordDataFrame</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([</span>
-    <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="p">[</span><span class="s">&quot;Hi&quot;</span><span class="p">,</span> <span class="s">&quot;I&quot;</span><span class="p">,</span> <span class="s">&quot;heard&quot;</span><span class="p">,</span> <span class="s">&quot;about&quot;</span><span class="p">,</span> <span class="s">&quot;Spark&quot;</span><span class="p">]),</span>
-    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="p">[</span><span class="s">&quot;I&quot;</span><span class="p">,</span> <span class="s">&quot;wish&quot;</span><span class="p">,</span> <span class="s">&quot;Java&quot;</span><span class="p">,</span> <span class="s">&quot;could&quot;</span><span class="p">,</span> <span class="s">&quot;use&quot;</span><span class="p">,</span> <span class="s">&quot;case&quot;</span><span class="p">,</span> <span class="s">&quot;classes&quot;</span><span class="p">]),</span>
-    <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="p">[</span><span class="s">&quot;Logistic&quot;</span><span class="p">,</span> <span class="s">&quot;regression&quot;</span><span class="p">,</span> <span class="s">&quot;models&quot;</span><span class="p">,</span> <span class="s">&quot;are&quot;</span><span class="p">,</span> <span class="s">&quot;neat&quot;</span><span class="p">])</span>
-<span class="p">],</span> <span class="p">[</span><span class="s">&quot;id&quot;</span><span class="p">,</span> <span class="s">&quot;words&quot;</span><span class="p">])</span>
+    <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="p">[</span><span class="s2">&quot;Hi&quot;</span><span class="p">,</span> <span class="s2">&quot;I&quot;</span><span class="p">,</span> <span class="s2">&quot;heard&quot;</span><span class="p">,</span> <span class="s2">&quot;about&quot;</span><span class="p">,</span> <span class="s2">&quot;Spark&quot;</span><span class="p">]),</span>
+    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="p">[</span><span class="s2">&quot;I&quot;</span><span class="p">,</span> <span class="s2">&quot;wish&quot;</span><span class="p">,</span> <span class="s2">&quot;Java&quot;</span><span class="p">,</span> <span class="s2">&quot;could&quot;</span><span class="p">,</span> <span class="s2">&quot;use&quot;</span><span class="p">,</span> <span class="s2">&quot;case&quot;</span><span class="p">,</span> <span class="s2">&quot;classes&quot;</span><span class="p">]),</span>
+    <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="p">[</span><span class="s2">&quot;Logistic&quot;</span><span class="p">,</span> <span class="s2">&quot;regression&quot;</span><span class="p">,</span> <span class="s2">&quot;models&quot;</span><span class="p">,</span> <span class="s2">&quot;are&quot;</span><span class="p">,</span> <span class="s2">&quot;neat&quot;</span><span class="p">])</span>
+<span class="p">],</span> <span class="p">[</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="s2">&quot;words&quot;</span><span class="p">])</span>
 
-<span class="n">ngram</span> <span class="o">=</span> <span class="n">NGram</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;words&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;ngrams&quot;</span><span class="p">)</span>
+<span class="n">ngram</span> <span class="o">=</span> <span class="n">NGram</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;words&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;ngrams&quot;</span><span class="p">)</span>
 
 <span class="n">ngramDataFrame</span> <span class="o">=</span> <span class="n">ngram</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">wordDataFrame</span><span class="p">)</span>
-<span class="n">ngramDataFrame</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&quot;ngrams&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">truncate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
+<span class="n">ngramDataFrame</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">&quot;ngrams&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">truncate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/ml/n_gram_example.py" in the Spark repo.</small></div>
   </div>
@@ -1165,7 +1165,7 @@ for <code>inputCol</code>.</p>
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.ml.feature.Binarizer">Binarizer Scala docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.Binarizer</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.Binarizer</span>
 
 <span class="k">val</span> <span class="n">data</span> <span class="k">=</span> <span class="nc">Array</span><span class="o">((</span><span class="mi">0</span><span class="o">,</span> <span class="mf">0.1</span><span class="o">),</span> <span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mf">0.8</span><span class="o">),</span> <span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="mf">0.2</span><span class="o">))</span>
 <span class="k">val</span> <span class="n">dataFrame</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="o">(</span><span class="n">data</span><span class="o">).</span><span class="n">toDF</span><span class="o">(</span><span class="s">&quot;id&quot;</span><span class="o">,</span> <span class="s">&quot;feature&quot;</span><span class="o">)</span>
@@ -1177,7 +1177,7 @@ for more details on the API.</p>
 
 <span class="k">val</span> <span class="n">binarizedDataFrame</span> <span class="k">=</span> <span class="n">binarizer</span><span class="o">.</span><span class="n">transform</span><span class="o">(</span><span class="n">dataFrame</span><span class="o">)</span>
 
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Binarizer output with Threshold = ${binarizer.getThreshold}&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Binarizer output with Threshold = </span><span class="si">${</span><span class="n">binarizer</span><span class="o">.</span><span class="n">getThreshold</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 <span class="n">binarizedDataFrame</span><span class="o">.</span><span class="n">show</span><span class="o">()</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/ml/BinarizerExample.scala" in the Spark repo.</small></div>
@@ -1188,7 +1188,7 @@ for more details on the API.</p>
     <p>Refer to the <a href="api/java/org/apache/spark/ml/feature/Binarizer.html">Binarizer Java docs</a>
 for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.List</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.feature.Binarizer</span><span class="o">;</span>
@@ -1204,13 +1204,13 @@ for more details on the API.</p>
   <span class="n">RowFactory</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="mi">1</span><span class="o">,</span> <span class="mf">0.8</span><span class="o">),</span>
   <span class="n">RowFactory</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="mf">0.2</span><span class="o">)</span>
 <span class="o">);</span>
-<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span><span class="o">[]{</span>
-  <span class="k">new</span> <span class="nf">StructField</span><span class="o">(</span><span class="s">&quot;id&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">IntegerType</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">()),</span>
-  <span class="k">new</span> <span class="nf">StructField</span><span class="o">(</span><span class="s">&quot;feature&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">DoubleType</span><span class="o">,</span> <span class="kc">false</span>

<TRUNCATED>

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[08/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/quick-start.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/quick-start.html b/site/docs/2.1.0/quick-start.html
index 76e67e1..9d5fad7 100644
--- a/site/docs/2.1.0/quick-start.html
+++ b/site/docs/2.1.0/quick-start.html
@@ -129,14 +129,14 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#interactive-analysis-with-the-spark-shell" id="markdown-toc-interactive-analysis-with-the-spark-shell">Interactive Analysis with the Spark Shell</a>    <ul>
-      <li><a href="#basics" id="markdown-toc-basics">Basics</a></li>
-      <li><a href="#more-on-rdd-operations" id="markdown-toc-more-on-rdd-operations">More on RDD Operations</a></li>
-      <li><a href="#caching" id="markdown-toc-caching">Caching</a></li>
+  <li><a href="#interactive-analysis-with-the-spark-shell">Interactive Analysis with the Spark Shell</a>    <ul>
+      <li><a href="#basics">Basics</a></li>
+      <li><a href="#more-on-rdd-operations">More on RDD Operations</a></li>
+      <li><a href="#caching">Caching</a></li>
     </ul>
   </li>
-  <li><a href="#self-contained-applications" id="markdown-toc-self-contained-applications">Self-Contained Applications</a></li>
-  <li><a href="#where-to-go-from-here" id="markdown-toc-where-to-go-from-here">Where to Go from Here</a></li>
+  <li><a href="#self-contained-applications">Self-Contained Applications</a></li>
+  <li><a href="#where-to-go-from-here">Where to Go from Here</a></li>
 </ul>
 
 <p>This tutorial provides a quick introduction to using Spark. We will first introduce the API through Spark&#8217;s
@@ -164,26 +164,26 @@ or Python. Start it by running the following in the Spark directory:</p>
 
     <p>Spark&#8217;s primary abstraction is a distributed collection of items called a Resilient Distributed Dataset (RDD). RDDs can be created from Hadoop InputFormats (such as HDFS files) or by transforming other RDDs. Let&#8217;s make a new RDD from the text of the README file in the Spark source directory:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">scala</span><span class="o">&gt;</span> <span class="k">val</span> <span class="n">textFile</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;README.md&quot;</span><span class="o">)</span>
-<span class="n">textFile</span><span class="k">:</span> <span class="kt">org.apache.spark.rdd.RDD</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="nc">README</span><span class="o">.</span><span class="n">md</span> <span class="nc">MapPartitionsRDD</span><span class="o">[</span><span class="err">1</span><span class="o">]</span> <span class="n">at</span> <span class="n">textFile</span> <span class="n">at</span> <span class="o">&lt;</span><span class="n">console</span><span class="k">&gt;:</span><span class="mi">25</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">scala</span><span class="o">&gt;</span> <span class="k">val</span> <span class="n">textFile</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="s">&quot;README.md&quot;</span><span class="o">)</span>
+<span class="n">textFile</span><span class="k">:</span> <span class="kt">org.apache.spark.rdd.RDD</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="nc">README</span><span class="o">.</span><span class="n">md</span> <span class="nc">MapPartitionsRDD</span><span class="o">[</span><span class="err">1</span><span class="o">]</span> <span class="n">at</span> <span class="n">textFile</span> <span class="n">at</span> <span class="o">&lt;</span><span class="n">console</span><span class="k">&gt;:</span><span class="mi">25</span></code></pre></figure>
 
     <p>RDDs have <em><a href="programming-guide.html#actions">actions</a></em>, which return values, and <em><a href="programming-guide.html#transformations">transformations</a></em>, which return pointers to new RDDs. Let&#8217;s start with a few actions:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">scala</span><span class="o">&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">count</span><span class="o">()</span> <span class="c1">// Number of items in this RDD</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">scala</span><span class="o">&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">count</span><span class="o">()</span> <span class="c1">// Number of items in this RDD</span>
 <span class="n">res0</span><span class="k">:</span> <span class="kt">Long</span> <span class="o">=</span> <span class="mi">126</span> <span class="c1">// May be different from yours as README.md will change over time, similar to other outputs</span>
 
 <span class="n">scala</span><span class="o">&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">first</span><span class="o">()</span> <span class="c1">// First item in this RDD</span>
-<span class="n">res1</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="k">#</span> <span class="nc">Apache</span> <span class="nc">Spark</span></code></pre></div>
+<span class="n">res1</span><span class="k">:</span> <span class="kt">String</span> <span class="o">=</span> <span class="k">#</span> <span class="nc">Apache</span> <span class="nc">Spark</span></code></pre></figure>
 
     <p>Now let&#8217;s use a transformation. We will use the <a href="programming-guide.html#transformations"><code>filter</code></a> transformation to return a new RDD with a subset of the items in the file.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">scala</span><span class="o">&gt;</span> <span class="k">val</span> <span class="n">linesWithSpark</span> <span class="k">=</span> <span class="n">textFile</span><span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="n">line</span><span class="o">.</span><span class="n">contains</span><span class="o">(</span><span class="s">&quot;Spark&quot;</span><span class="o">))</span>
-<span class="n">linesWithSpark</span><span class="k">:</span> <span class="kt">org.apache.spark.rdd.RDD</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="nc">MapPartitionsRDD</span><span class="o">[</span><span class="err">2</span><span class="o">]</span> <span class="n">at</span> <span class="n">filter</span> <span class="n">at</span> <span class="o">&lt;</span><span class="n">console</span><span class="k">&gt;:</span><span class="mi">27</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">scala</span><span class="o">&gt;</span> <span class="k">val</span> <span class="n">linesWithSpark</span> <span class="k">=</span> <span class="n">textFile</span><span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="n">line</span><span class="o">.</span><span class="n">contains</span><span class="o">(</span><span class="s">&quot;Spark&quot;</span><span class="o">))</span>
+<span class="n">linesWithSpark</span><span class="k">:</span> <span class="kt">org.apache.spark.rdd.RDD</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="nc">MapPartitionsRDD</span><span class="o">[</span><span class="err">2</span><span class="o">]</span> <span class="n">at</span> <span class="n">filter</span> <span class="n">at</span> <span class="o">&lt;</span><span class="n">console</span><span class="k">&gt;:</span><span class="mi">27</span></code></pre></figure>
 
     <p>We can chain together transformations and actions:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">scala</span><span class="o">&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="n">line</span><span class="o">.</span><span class="n">contains</span><span class="o">(</span><span class="s">&quot;Spark&quot;</span><span class="o">)).</span><span class="n">count</span><span class="o">()</span> <span class="c1">// How many lines contain &quot;Spark&quot;?</span>
-<span class="n">res3</span><span class="k">:</span> <span class="kt">Long</span> <span class="o">=</span> <span class="mi">15</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">scala</span><span class="o">&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="n">line</span><span class="o">.</span><span class="n">contains</span><span class="o">(</span><span class="s">&quot;Spark&quot;</span><span class="o">)).</span><span class="n">count</span><span class="o">()</span> <span class="c1">// How many lines contain &quot;Spark&quot;?</span>
+<span class="n">res3</span><span class="k">:</span> <span class="kt">Long</span> <span class="o">=</span> <span class="mi">15</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
@@ -193,24 +193,24 @@ or Python. Start it by running the following in the Spark directory:</p>
 
     <p>Spark&#8217;s primary abstraction is a distributed collection of items called a Resilient Distributed Dataset (RDD). RDDs can be created from Hadoop InputFormats (such as HDFS files) or by transforming other RDDs. Let&#8217;s make a new RDD from the text of the README file in the Spark source directory:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">&gt;&gt;&gt;</span> <span class="n">textFile</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;README.md&quot;</span><span class="p">)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="o">&gt;&gt;&gt;</span> <span class="n">textFile</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;README.md&quot;</span><span class="p">)</span></code></pre></figure>
 
     <p>RDDs have <em><a href="programming-guide.html#actions">actions</a></em>, which return values, and <em><a href="programming-guide.html#transformations">transformations</a></em>, which return pointers to new RDDs. Let&#8217;s start with a few actions:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">&gt;&gt;&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>  <span class="c"># Number of items in this RDD</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="o">&gt;&gt;&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>  <span class="c1"># Number of items in this RDD</span>
 <span class="mi">126</span>
 
-<span class="o">&gt;&gt;&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">first</span><span class="p">()</span>  <span class="c"># First item in this RDD</span>
-<span class="s">u&#39;# Apache Spark&#39;</span></code></pre></div>
+<span class="o">&gt;&gt;&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">first</span><span class="p">()</span>  <span class="c1"># First item in this RDD</span>
+<span class="sa">u</span><span class="s1">&#39;# Apache Spark&#39;</span></code></pre></figure>
 
     <p>Now let&#8217;s use a transformation. We will use the <a href="programming-guide.html#transformations"><code>filter</code></a> transformation to return a new RDD with a subset of the items in the file.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">&gt;&gt;&gt;</span> <span class="n">linesWithSpark</span> <span class="o">=</span> <span class="n">textFile</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="s">&quot;Spark&quot;</span> <span class="ow">in</span> <span class="n">line</span><span class="p">)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="o">&gt;&gt;&gt;</span> <span class="n">linesWithSpark</span> <span class="o">=</span> <span class="n">textFile</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="s2">&quot;Spark&quot;</span> <span class="ow">in</span> <span class="n">line</span><span class="p">)</span></code></pre></figure>
 
     <p>We can chain together transformations and actions:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">&gt;&gt;&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="s">&quot;Spark&quot;</span> <span class="ow">in</span> <span class="n">line</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>  <span class="c"># How many lines contain &quot;Spark&quot;?</span>
-<span class="mi">15</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="o">&gt;&gt;&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="s2">&quot;Spark&quot;</span> <span class="ow">in</span> <span class="n">line</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>  <span class="c1"># How many lines contain &quot;Spark&quot;?</span>
+<span class="mi">15</span></code></pre></figure>
 
   </div>
 </div>
@@ -221,38 +221,38 @@ or Python. Start it by running the following in the Spark directory:</p>
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">scala</span><span class="o">&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">).</span><span class="n">size</span><span class="o">).</span><span class="n">reduce</span><span class="o">((</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="k">if</span> <span class="o">(</span><span class="n">a</span> <span class="o">&gt;</span> <span class="n">b</span><span class="o">)</span> <span class="n">a</span> <span class="k">else</span> <span class="n">b</span><span class="o">)</span>
-<span class="n">res4</span><span class="k">:</span> <span class="kt">Long</span> <span class="o">=</span> <span class="mi">15</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">scala</span><span class="o">&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">).</span><span class="n">size</span><span class="o">).</span><span class="n">reduce</span><span class="o">((</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="k">if</span> <span class="o">(</span><span class="n">a</span> <span class="o">&gt;</span> <span class="n">b</span><span class="o">)</span> <span class="n">a</span> <span class="k">else</span> <span class="n">b</span><span class="o">)</span>
+<span class="n">res4</span><span class="k">:</span> <span class="kt">Long</span> <span class="o">=</span> <span class="mi">15</span></code></pre></figure>
 
     <p>This first maps a line to an integer value, creating a new RDD. <code>reduce</code> is called on that RDD to find the largest line count. The arguments to <code>map</code> and <code>reduce</code> are Scala function literals (closures), and can use any language feature or Scala/Java library. For example, we can easily call functions declared elsewhere. We&#8217;ll use <code>Math.max()</code> function to make this code easier to understand:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">scala</span><span class="o">&gt;</span> <span class="k">import</span> <span class="nn">java.lang.Math</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">scala</span><span class="o">&gt;</span> <span class="k">import</span> <span class="nn">java.lang.Math</span>
 <span class="k">import</span> <span class="nn">java.lang.Math</span>
 
 <span class="n">scala</span><span class="o">&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">).</span><span class="n">size</span><span class="o">).</span><span class="n">reduce</span><span class="o">((</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="nc">Math</span><span class="o">.</span><span class="n">max</span><span class="o">(</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">))</span>
-<span class="n">res5</span><span class="k">:</span> <span class="kt">Int</span> <span class="o">=</span> <span class="mi">15</span></code></pre></div>
+<span class="n">res5</span><span class="k">:</span> <span class="kt">Int</span> <span class="o">=</span> <span class="mi">15</span></code></pre></figure>
 
     <p>One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can implement MapReduce flows easily:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">scala</span><span class="o">&gt;</span> <span class="k">val</span> <span class="n">wordCounts</span> <span class="k">=</span> <span class="n">textFile</span><span class="o">.</span><span class="n">flatMap</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">)).</span><span class="n">map</span><span class="o">(</span><span class="n">word</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">word</span><span class="o">,</span> <span class="mi">1</span><span class="o">)).</span><span class="n">reduceByKey</span><span class="o">((</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">a</span> 
 <span class="o">+</span> <span class="n">b</span><span class="o">)</span>
-<span class="n">wordCounts</span><span class="k">:</span> <span class="kt">org.apache.spark.rdd.RDD</span><span class="o">[(</span><span class="kt">String</span>, <span class="kt">Int</span><span class="o">)]</span> <span class="k">=</span> <span class="nc">ShuffledRDD</span><span class="o">[</span><span class="err">8</span><span class="o">]</span> <span class="n">at</span> <span class="n">reduceByKey</span> <span class="n">at</span> <span class="o">&lt;</span><span class="n">console</span><span class="k">&gt;:</span><span class="mi">28</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">scala</span><span class="o">&gt;</span> <span class="k">val</span> <span class="n">wordCounts</span> <span class="k">=</span> <span class="n">textFile</span><span class="o">.</span><span class="n">flatMap</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">)).</span><span class="n">map</span><span class="o">(</span><span class="n">word</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">word</span><span class="o">,</span> <span class="mi">1</span><span class="o">)).</span><span class="n">reduceByKey</span><span class="o">((</span><span class="n">a</span><span class="o">,</span> <span class="n">b</span><span class="o">)</span> <span class="k">=&gt;</span> <span cla
 ss="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">)</span>
+<span class="n">wordCounts</span><span class="k">:</span> <span class="kt">org.apache.spark.rdd.RDD</span><span class="o">[(</span><span class="kt">String</span>, <span class="kt">Int</span><span class="o">)]</span> <span class="k">=</span> <span class="nc">ShuffledRDD</span><span class="o">[</span><span class="err">8</span><span class="o">]</span> <span class="n">at</span> <span class="n">reduceByKey</span> <span class="n">at</span> <span class="o">&lt;</span><span class="n">console</span><span class="k">&gt;:</span><span class="mi">28</span></code></pre></figure>
 
     <p>Here, we combined the <a href="programming-guide.html#transformations"><code>flatMap</code></a>, <a href="programming-guide.html#transformations"><code>map</code></a>, and <a href="programming-guide.html#transformations"><code>reduceByKey</code></a> transformations to compute the per-word counts in the file as an RDD of (String, Int) pairs. To collect the word counts in our shell, we can use the <a href="programming-guide.html#actions"><code>collect</code></a> action:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">scala</span><span class="o">&gt;</span> <span class="n">wordCounts</span><span class="o">.</span><span class="n">collect</span><span class="o">()</span>
-<span class="n">res6</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[(</span><span class="kt">String</span>, <span class="kt">Int</span><span class="o">)]</span> <span class="k">=</span> <span class="nc">Array</span><span class="o">((</span><span class="n">means</span><span class="o">,</span><span class="mi">1</span><span class="o">),</span> <span class="o">(</span><span class="n">under</span><span class="o">,</span><span class="mi">2</span><span class="o">),</span> <span class="o">(</span><span class="k">this</span><span class="o">,</span><span class="mi">3</span><span class="o">),</span> <span class="o">(</span><span class="nc">Because</span><span class="o">,</span><span class="mi">1</span><span class="o">),</span> <span class="o">(</span><span class="nc">Python</span><span class="o">,</span><span class="mi">2</span><span class="o">),</span> <span class="o">(</span><span class="n">agree</span><span class="o">,</span><span class="mi">1</span><span class
 ="o">),</span> <span class="o">(</span><span class="n">cluster</span><span class="o">.,</span><span class="mi">1</span><span class="o">),</span> <span class="o">...)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">scala</span><span class="o">&gt;</span> <span class="n">wordCounts</span><span class="o">.</span><span class="n">collect</span><span class="o">()</span>
+<span class="n">res6</span><span class="k">:</span> <span class="kt">Array</span><span class="o">[(</span><span class="kt">String</span>, <span class="kt">Int</span><span class="o">)]</span> <span class="k">=</span> <span class="nc">Array</span><span class="o">((</span><span class="n">means</span><span class="o">,</span><span class="mi">1</span><span class="o">),</span> <span class="o">(</span><span class="n">under</span><span class="o">,</span><span class="mi">2</span><span class="o">),</span> <span class="o">(</span><span class="k">this</span><span class="o">,</span><span class="mi">3</span><span class="o">),</span> <span class="o">(</span><span class="nc">Because</span><span class="o">,</span><span class="mi">1</span><span class="o">),</span> <span class="o">(</span><span class="nc">Python</span><span class="o">,</span><span class="mi">2</span><span class="o">),</span> <span class="o">(</span><span class="n">agree</span><span class="o">,</span><span class="mi">1</span><span class
 ="o">),</span> <span class="o">(</span><span class="n">cluster</span><span class="o">.,</span><span class="mi">1</span><span class="o">),</span> <span class="o">...)</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">&gt;&gt;&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">()))</span><span class="o">.</span><span class="n">reduce</span><span class="p">(</span><span class="k">lambda</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="n">a</span> <span class="k">if</span> <span class="p">(</span><span class="n">a</span> <span class="o">&gt;</span> <span class="n">b</span><span class="p">)</span> <span class="k">else</span> <span class="n">b</span><span class="p">)</span>
-<span class="mi">15</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="o">&gt;&gt;&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">()))</span><span class="o">.</span><span class="n">reduce</span><span class="p">(</span><span class="k">lambda</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="n">a</span> <span class="k">if</span> <span class="p">(</span><span class="n">a</span> <span class="o">&gt;</span> <span class="n">b</span><span class="p">)</span> <span class="k">else</span> <span class="n">b</span><span class="p">)</span>
+<span class="mi">15</span></code></pre></figure>
 
     <p>This first maps a line to an integer value, creating a new RDD. <code>reduce</code> is called on that RDD to find the largest line count. The arguments to <code>map</code> and <code>reduce</code> are Python <a href="https://docs.python.org/2/reference/expressions.html#lambda">anonymous functions (lambdas)</a>,
 but we can also pass any top-level Python function we want.
 For example, we&#8217;ll define a <code>max</code> function to make this code easier to understand:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">&gt;&gt;&gt;</span> <span class="k">def</span> <span class="nf">max</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="o">&gt;&gt;&gt;</span> <span class="k">def</span> <span class="nf">max</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">):</span>
 <span class="o">...</span>     <span class="k">if</span> <span class="n">a</span> <span class="o">&gt;</span> <span class="n">b</span><span class="p">:</span>
 <span class="o">...</span>         <span class="k">return</span> <span class="n">a</span>
 <span class="o">...</span>     <span class="k">else</span><span class="p">:</span>
@@ -260,16 +260,16 @@ For example, we&#8217;ll define a <code>max</code> function to make this code ea
 <span class="o">...</span>
 
 <span class="o">&gt;&gt;&gt;</span> <span class="n">textFile</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="nb">len</span><span class="p">(</span><span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">()))</span><span class="o">.</span><span class="n">reduce</span><span class="p">(</span><span class="nb">max</span><span class="p">)</span>
-<span class="mi">15</span></code></pre></div>
+<span class="mi">15</span></code></pre></figure>
 
     <p>One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can implement MapReduce flows easily:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">&gt;&gt;&gt;</span> <span class="n">wordCounts</span> <span class="o">=</span> <span class="n">textFile</span><span class="o">.</span><span class="n">flatMap</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">())</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">word</span><span class="p">:</span> <span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span><span class="o">.</span><span class="n">reduceByKey</span><span class="p">(</span><span class="k">lambda</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="n">a</span><span cla
 ss="o">+</span><span class="n">b</span><span class="p">)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="o">&gt;&gt;&gt;</span> <span class="n">wordCounts</span> <span class="o">=</span> <span class="n">textFile</span><span class="o">.</span><span class="n">flatMap</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">())</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">word</span><span class="p">:</span> <span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span><span class="o">.</span><span class="n">reduceByKey</span><span class="p">(</span><span class="k">lambda</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">:</span> <span class="n">a
 </span><span class="o">+</span><span class="n">b</span><span class="p">)</span></code></pre></figure>
 
     <p>Here, we combined the <a href="programming-guide.html#transformations"><code>flatMap</code></a>, <a href="programming-guide.html#transformations"><code>map</code></a>, and <a href="programming-guide.html#transformations"><code>reduceByKey</code></a> transformations to compute the per-word counts in the file as an RDD of (string, int) pairs. To collect the word counts in our shell, we can use the <a href="programming-guide.html#actions"><code>collect</code></a> action:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">&gt;&gt;&gt;</span> <span class="n">wordCounts</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span>
-<span class="p">[(</span><span class="s">u&#39;and&#39;</span><span class="p">,</span> <span class="mi">9</span><span class="p">),</span> <span class="p">(</span><span class="s">u&#39;A&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="s">u&#39;webpage&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="s">u&#39;README&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="s">u&#39;Note&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="s">u&#39;&quot;local&quot;&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="s">u&#39;variable&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class=
 "o">...</span><span class="p">]</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="o">&gt;&gt;&gt;</span> <span class="n">wordCounts</span><span class="o">.</span><span class="n">collect</span><span class="p">()</span>
+<span class="p">[(</span><span class="sa">u</span><span class="s1">&#39;and&#39;</span><span class="p">,</span> <span class="mi">9</span><span class="p">),</span> <span class="p">(</span><span class="sa">u</span><span class="s1">&#39;A&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="sa">u</span><span class="s1">&#39;webpage&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="sa">u</span><span class="s1">&#39;README&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="sa">u</span><span class="s1">&#39;Note&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="p">(</span><span class="sa">u</span><span class="s1">&#39;&quot;local&quot;&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <spa
 n class="p">(</span><span class="sa">u</span><span class="s1">&#39;variable&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="o">...</span><span class="p">]</span></code></pre></figure>
 
   </div>
 </div>
@@ -280,14 +280,14 @@ For example, we&#8217;ll define a <code>max</code> function to make this code ea
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">scala</span><span class="o">&gt;</span> <span class="n">linesWithSpark</span><span class="o">.</span><span class="n">cache</span><span class="o">()</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">scala</span><span class="o">&gt;</span> <span class="n">linesWithSpark</span><span class="o">.</span><span class="n">cache</span><span class="o">()</span>
 <span class="n">res7</span><span class="k">:</span> <span class="kt">linesWithSpark.</span><span class="k">type</span> <span class="o">=</span> <span class="nc">MapPartitionsRDD</span><span class="o">[</span><span class="err">2</span><span class="o">]</span> <span class="n">at</span> <span class="n">filter</span> <span class="n">at</span> <span class="o">&lt;</span><span class="n">console</span><span class="k">&gt;:</span><span class="mi">27</span>
 
 <span class="n">scala</span><span class="o">&gt;</span> <span class="n">linesWithSpark</span><span class="o">.</span><span class="n">count</span><span class="o">()</span>
 <span class="n">res8</span><span class="k">:</span> <span class="kt">Long</span> <span class="o">=</span> <span class="mi">15</span>
 
 <span class="n">scala</span><span class="o">&gt;</span> <span class="n">linesWithSpark</span><span class="o">.</span><span class="n">count</span><span class="o">()</span>
-<span class="n">res9</span><span class="k">:</span> <span class="kt">Long</span> <span class="o">=</span> <span class="mi">15</span></code></pre></div>
+<span class="n">res9</span><span class="k">:</span> <span class="kt">Long</span> <span class="o">=</span> <span class="mi">15</span></code></pre></figure>
 
     <p>It may seem silly to use Spark to explore and cache a 100-line text file. The interesting part is
 that these same functions can be used on very large data sets, even when they are striped across
@@ -297,13 +297,13 @@ a cluster, as described in the <a href="programming-guide.html#initializing-spar
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">&gt;&gt;&gt;</span> <span class="n">linesWithSpark</span><span class="o">.</span><span class="n">cache</span><span class="p">()</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="o">&gt;&gt;&gt;</span> <span class="n">linesWithSpark</span><span class="o">.</span><span class="n">cache</span><span class="p">()</span>
 
 <span class="o">&gt;&gt;&gt;</span> <span class="n">linesWithSpark</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
 <span class="mi">15</span>
 
 <span class="o">&gt;&gt;&gt;</span> <span class="n">linesWithSpark</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
-<span class="mi">15</span></code></pre></div>
+<span class="mi">15</span></code></pre></figure>
 
     <p>It may seem silly to use Spark to explore and cache a 100-line text file. The interesting part is
 that these same functions can be used on very large data sets, even when they are striped across
@@ -323,7 +323,7 @@ simple application in Scala (with sbt), Java (with Maven), and Python.</p>
     <p>We&#8217;ll create a very simple Spark application in Scala&#8211;so simple, in fact, that it&#8217;s
 named <code>SimpleApp.scala</code>:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="cm">/* SimpleApp.scala */</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="cm">/* SimpleApp.scala */</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.SparkContext</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.SparkContext._</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.SparkConf</span>
@@ -336,10 +336,10 @@ named <code>SimpleApp.scala</code>:</p>
     <span class="k">val</span> <span class="n">logData</span> <span class="k">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="o">(</span><span class="n">logFile</span><span class="o">,</span> <span class="mi">2</span><span class="o">).</span><span class="n">cache</span><span class="o">()</span>
     <span class="k">val</span> <span class="n">numAs</span> <span class="k">=</span> <span class="n">logData</span><span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="n">line</span><span class="o">.</span><span class="n">contains</span><span class="o">(</span><span class="s">&quot;a&quot;</span><span class="o">)).</span><span class="n">count</span><span class="o">()</span>
     <span class="k">val</span> <span class="n">numBs</span> <span class="k">=</span> <span class="n">logData</span><span class="o">.</span><span class="n">filter</span><span class="o">(</span><span class="n">line</span> <span class="k">=&gt;</span> <span class="n">line</span><span class="o">.</span><span class="n">contains</span><span class="o">(</span><span class="s">&quot;b&quot;</span><span class="o">)).</span><span class="n">count</span><span class="o">()</span>
-    <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Lines with a: $numAs, Lines with b: $numBs&quot;</span><span class="o">)</span>
+    <span class="n">println</span><span class="o">(</span><span class="s">s&quot;Lines with a: </span><span class="si">$numAs</span><span class="s">, Lines with b: </span><span class="si">$numBs</span><span class="s">&quot;</span><span class="o">)</span>
     <span class="n">sc</span><span class="o">.</span><span class="n">stop</span><span class="o">()</span>
   <span class="o">}</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
     <p>Note that applications should define a <code>main()</code> method instead of extending <code>scala.App</code>.
 Subclasses of <code>scala.App</code> may not work correctly.</p>
@@ -352,26 +352,26 @@ we initialize a SparkContext as part of the program.</p>
     <p>We pass the SparkContext constructor a 
 <a href="api/scala/index.html#org.apache.spark.SparkConf">SparkConf</a>
 object which contains information about our
-application.</p>
+application. </p>
 
     <p>Our application depends on the Spark API, so we&#8217;ll also include an sbt configuration file, 
 <code>simple.sbt</code>, which explains that Spark is a dependency. This file also adds a repository that 
 Spark depends on:</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">name</span> <span class="o">:=</span> <span class="s">&quot;Simple Project&quot;</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">name</span> <span class="o">:=</span> <span class="s">&quot;Simple Project&quot;</span>
 
 <span class="n">version</span> <span class="o">:=</span> <span class="s">&quot;1.0&quot;</span>
 
 <span class="n">scalaVersion</span> <span class="o">:=</span> <span class="s">&quot;2.11.7&quot;</span>
 
-<span class="n">libraryDependencies</span> <span class="o">+=</span> <span class="s">&quot;org.apache.spark&quot;</span> <span class="o">%%</span> <span class="s">&quot;spark-core&quot;</span> <span class="o">%</span> <span class="s">&quot;2.1.0&quot;</span></code></pre></div>
+<span class="n">libraryDependencies</span> <span class="o">+=</span> <span class="s">&quot;org.apache.spark&quot;</span> <span class="o">%%</span> <span class="s">&quot;spark-core&quot;</span> <span class="o">%</span> <span class="s">&quot;2.1.0&quot;</span></code></pre></figure>
 
     <p>For sbt to work correctly, we&#8217;ll need to layout <code>SimpleApp.scala</code> and <code>simple.sbt</code>
 according to the typical directory structure. Once that is in place, we can create a JAR package
 containing the application&#8217;s code, then use the <code>spark-submit</code> script to run our program.</p>
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Your directory layout should look like this</span>
-<span class="nv">$ </span>find .
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="c1"># Your directory layout should look like this</span>
+$ find .
 .
 ./simple.sbt
 ./src
@@ -379,18 +379,18 @@ containing the application&#8217;s code, then use the <code>spark-submit</code>
 ./src/main/scala
 ./src/main/scala/SimpleApp.scala
 
-<span class="c"># Package a jar containing your application</span>
-<span class="nv">$ </span>sbt package
+<span class="c1"># Package a jar containing your application</span>
+$ sbt package
 ...
 <span class="o">[</span>info<span class="o">]</span> Packaging <span class="o">{</span>..<span class="o">}</span>/<span class="o">{</span>..<span class="o">}</span>/target/scala-2.11/simple-project_2.11-1.0.jar
 
-<span class="c"># Use spark-submit to run your application</span>
-<span class="nv">$ </span>YOUR_SPARK_HOME/bin/spark-submit <span class="se">\</span>
+<span class="c1"># Use spark-submit to run your application</span>
+$ YOUR_SPARK_HOME/bin/spark-submit <span class="se">\</span>
   --class <span class="s2">&quot;SimpleApp&quot;</span> <span class="se">\</span>
-  --master <span class="nb">local</span><span class="o">[</span>4<span class="o">]</span> <span class="se">\</span>
+  --master local<span class="o">[</span><span class="m">4</span><span class="o">]</span> <span class="se">\</span>
   target/scala-2.11/simple-project_2.11-1.0.jar
 ...
-Lines with a: 46, Lines with b: 23</code></pre></div>
+Lines with a: <span class="m">46</span>, Lines with b: <span class="m">23</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
@@ -398,7 +398,7 @@ Lines with a: 46, Lines with b: 23</code></pre></div>
 
     <p>We&#8217;ll create a very simple Spark application, <code>SimpleApp.java</code>:</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="cm">/* SimpleApp.java */</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="cm">/* SimpleApp.java */</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.*</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.SparkConf</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
@@ -406,8 +406,8 @@ Lines with a: 46, Lines with b: 23</code></pre></div>
 <span class="kd">public</span> <span class="kd">class</span> <span class="nc">SimpleApp</span> <span class="o">{</span>
   <span class="kd">public</span> <span class="kd">static</span> <span class="kt">void</span> <span class="nf">main</span><span class="o">(</span><span class="n">String</span><span class="o">[]</span> <span class="n">args</span><span class="o">)</span> <span class="o">{</span>
     <span class="n">String</span> <span class="n">logFile</span> <span class="o">=</span> <span class="s">&quot;YOUR_SPARK_HOME/README.md&quot;</span><span class="o">;</span> <span class="c1">// Should be some file on your system</span>
-    <span class="n">SparkConf</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;Simple Application&quot;</span><span class="o">);</span>
-    <span class="n">JavaSparkContext</span> <span class="n">sc</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JavaSparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">);</span>
+    <span class="n">SparkConf</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;Simple Application&quot;</span><span class="o">);</span>
+    <span class="n">JavaSparkContext</span> <span class="n">sc</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaSparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">);</span>
     <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">logData</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">textFile</span><span class="o">(</span><span class="n">logFile</span><span class="o">).</span><span class="na">cache</span><span class="o">();</span>
 
     <span class="kt">long</span> <span class="n">numAs</span> <span class="o">=</span> <span class="n">logData</span><span class="o">.</span><span class="na">filter</span><span class="o">(</span><span class="k">new</span> <span class="n">Function</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Boolean</span><span class="o">&gt;()</span> <span class="o">{</span>
@@ -422,7 +422,7 @@ Lines with a: 46, Lines with b: 23</code></pre></div>
     
     <span class="n">sc</span><span class="o">.</span><span class="na">stop</span><span class="o">();</span>
   <span class="o">}</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
     <p>This program just counts the number of lines containing &#8216;a&#8217; and the number containing &#8216;b&#8217; in a text
 file. Note that you&#8217;ll need to replace YOUR_SPARK_HOME with the location where Spark is installed.
@@ -435,7 +435,7 @@ that extend <code>spark.api.java.function.Function</code>. The
     <p>To build the program, we also write a Maven <code>pom.xml</code> file that lists Spark as a dependency.
 Note that Spark artifacts are tagged with a Scala version.</p>
 
-    <div class="highlight"><pre><code class="language-xml" data-lang="xml"><span class="nt">&lt;project&gt;</span>
+    <figure class="highlight"><pre><code class="language-xml" data-lang="xml"><span></span><span class="nt">&lt;project&gt;</span>
   <span class="nt">&lt;groupId&gt;</span>edu.berkeley<span class="nt">&lt;/groupId&gt;</span>
   <span class="nt">&lt;artifactId&gt;</span>simple-project<span class="nt">&lt;/artifactId&gt;</span>
   <span class="nt">&lt;modelVersion&gt;</span>4.0.0<span class="nt">&lt;/modelVersion&gt;</span>
@@ -449,31 +449,31 @@ Note that Spark artifacts are tagged with a Scala version.</p>
       <span class="nt">&lt;version&gt;</span>2.1.0<span class="nt">&lt;/version&gt;</span>
     <span class="nt">&lt;/dependency&gt;</span>
   <span class="nt">&lt;/dependencies&gt;</span>
-<span class="nt">&lt;/project&gt;</span></code></pre></div>
+<span class="nt">&lt;/project&gt;</span></code></pre></figure>
 
     <p>We lay out these files according to the canonical Maven directory structure:</p>
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>find .
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ find .
 ./pom.xml
 ./src
 ./src/main
 ./src/main/java
-./src/main/java/SimpleApp.java</code></pre></div>
+./src/main/java/SimpleApp.java</code></pre></figure>
 
     <p>Now, we can package the application using Maven and execute it with <code>./bin/spark-submit</code>.</p>
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Package a JAR containing your application</span>
-<span class="nv">$ </span>mvn package
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="c1"># Package a JAR containing your application</span>
+$ mvn package
 ...
 <span class="o">[</span>INFO<span class="o">]</span> Building jar: <span class="o">{</span>..<span class="o">}</span>/<span class="o">{</span>..<span class="o">}</span>/target/simple-project-1.0.jar
 
-<span class="c"># Use spark-submit to run your application</span>
-<span class="nv">$ </span>YOUR_SPARK_HOME/bin/spark-submit <span class="se">\</span>
+<span class="c1"># Use spark-submit to run your application</span>
+$ YOUR_SPARK_HOME/bin/spark-submit <span class="se">\</span>
   --class <span class="s2">&quot;SimpleApp&quot;</span> <span class="se">\</span>
-  --master <span class="nb">local</span><span class="o">[</span>4<span class="o">]</span> <span class="se">\</span>
+  --master local<span class="o">[</span><span class="m">4</span><span class="o">]</span> <span class="se">\</span>
   target/simple-project-1.0.jar
 ...
-Lines with a: 46, Lines with b: 23</code></pre></div>
+Lines with a: <span class="m">46</span>, Lines with b: <span class="m">23</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
@@ -482,19 +482,19 @@ Lines with a: 46, Lines with b: 23</code></pre></div>
 
     <p>As an example, we&#8217;ll create a simple Spark application, <code>SimpleApp.py</code>:</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="sd">&quot;&quot;&quot;SimpleApp.py&quot;&quot;&quot;</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="sd">&quot;&quot;&quot;SimpleApp.py&quot;&quot;&quot;</span>
 <span class="kn">from</span> <span class="nn">pyspark</span> <span class="kn">import</span> <span class="n">SparkContext</span>
 
-<span class="n">logFile</span> <span class="o">=</span> <span class="s">&quot;YOUR_SPARK_HOME/README.md&quot;</span>  <span class="c"># Should be some file on your system</span>
-<span class="n">sc</span> <span class="o">=</span> <span class="n">SparkContext</span><span class="p">(</span><span class="s">&quot;local&quot;</span><span class="p">,</span> <span class="s">&quot;Simple App&quot;</span><span class="p">)</span>
+<span class="n">logFile</span> <span class="o">=</span> <span class="s2">&quot;YOUR_SPARK_HOME/README.md&quot;</span>  <span class="c1"># Should be some file on your system</span>
+<span class="n">sc</span> <span class="o">=</span> <span class="n">SparkContext</span><span class="p">(</span><span class="s2">&quot;local&quot;</span><span class="p">,</span> <span class="s2">&quot;Simple App&quot;</span><span class="p">)</span>
 <span class="n">logData</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="n">logFile</span><span class="p">)</span><span class="o">.</span><span class="n">cache</span><span class="p">()</span>
 
-<span class="n">numAs</span> <span class="o">=</span> <span class="n">logData</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="s">&#39;a&#39;</span> <span class="ow">in</span> <span class="n">s</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
-<span class="n">numBs</span> <span class="o">=</span> <span class="n">logData</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="s">&#39;b&#39;</span> <span class="ow">in</span> <span class="n">s</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
+<span class="n">numAs</span> <span class="o">=</span> <span class="n">logData</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="s1">&#39;a&#39;</span> <span class="ow">in</span> <span class="n">s</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
+<span class="n">numBs</span> <span class="o">=</span> <span class="n">logData</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="n">s</span><span class="p">:</span> <span class="s1">&#39;b&#39;</span> <span class="ow">in</span> <span class="n">s</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
 
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Lines with a: </span><span class="si">%i</span><span class="s">, lines with b: </span><span class="si">%i</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">numAs</span><span class="p">,</span> <span class="n">numBs</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Lines with a: </span><span class="si">%i</span><span class="s2">, lines with b: </span><span class="si">%i</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">numAs</span><span class="p">,</span> <span class="n">numBs</span><span class="p">))</span>
 
-<span class="n">sc</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span></code></pre></div>
+<span class="n">sc</span><span class="o">.</span><span class="n">stop</span><span class="p">()</span></code></pre></figure>
 
     <p>This program just counts the number of lines containing &#8216;a&#8217; and the number containing &#8216;b&#8217; in a
 text file.
@@ -509,12 +509,12 @@ dependencies to <code>spark-submit</code> through its <code>--py-files</code> ar
 
     <p>We can run this application using the <code>bin/spark-submit</code> script:</p>
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># Use spark-submit to run your application</span>
-<span class="nv">$ </span>YOUR_SPARK_HOME/bin/spark-submit <span class="se">\</span>
-  --master <span class="nb">local</span><span class="o">[</span>4<span class="o">]</span> <span class="se">\</span>
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="c1"># Use spark-submit to run your application</span>
+$ YOUR_SPARK_HOME/bin/spark-submit <span class="se">\</span>
+  --master local<span class="o">[</span><span class="m">4</span><span class="o">]</span> <span class="se">\</span>
   SimpleApp.py
 ...
-Lines with a: 46, Lines with b: 23</code></pre></div>
+Lines with a: <span class="m">46</span>, Lines with b: <span class="m">23</span></code></pre></figure>
 
   </div>
 </div>
@@ -534,14 +534,14 @@ or see &#8220;Programming Guides&#8221; menu for other components.</li>
 You can run them as follows:</li>
 </ul>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># For Scala and Java, use run-example:</span>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="c1"># For Scala and Java, use run-example:</span>
 ./bin/run-example SparkPi
 
-<span class="c"># For Python examples, use spark-submit directly:</span>
+<span class="c1"># For Python examples, use spark-submit directly:</span>
 ./bin/spark-submit examples/src/main/python/pi.py
 
-<span class="c"># For R examples, use spark-submit directly:</span>
-./bin/spark-submit examples/src/main/r/dataframe.R</code></pre></div>
+<span class="c1"># For R examples, use spark-submit directly:</span>
+./bin/spark-submit examples/src/main/r/dataframe.R</code></pre></figure>
 
 
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/running-on-mesos.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/running-on-mesos.html b/site/docs/2.1.0/running-on-mesos.html
index 198f53c..aec6fe8 100644
--- a/site/docs/2.1.0/running-on-mesos.html
+++ b/site/docs/2.1.0/running-on-mesos.html
@@ -127,33 +127,33 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#how-it-works" id="markdown-toc-how-it-works">How it Works</a></li>
-  <li><a href="#installing-mesos" id="markdown-toc-installing-mesos">Installing Mesos</a>    <ul>
-      <li><a href="#from-source" id="markdown-toc-from-source">From Source</a></li>
-      <li><a href="#third-party-packages" id="markdown-toc-third-party-packages">Third-Party Packages</a></li>
-      <li><a href="#verification" id="markdown-toc-verification">Verification</a></li>
+  <li><a href="#how-it-works">How it Works</a></li>
+  <li><a href="#installing-mesos">Installing Mesos</a>    <ul>
+      <li><a href="#from-source">From Source</a></li>
+      <li><a href="#third-party-packages">Third-Party Packages</a></li>
+      <li><a href="#verification">Verification</a></li>
     </ul>
   </li>
-  <li><a href="#connecting-spark-to-mesos" id="markdown-toc-connecting-spark-to-mesos">Connecting Spark to Mesos</a>    <ul>
-      <li><a href="#uploading-spark-package" id="markdown-toc-uploading-spark-package">Uploading Spark Package</a></li>
-      <li><a href="#using-a-mesos-master-url" id="markdown-toc-using-a-mesos-master-url">Using a Mesos Master URL</a></li>
-      <li><a href="#client-mode" id="markdown-toc-client-mode">Client Mode</a></li>
-      <li><a href="#cluster-mode" id="markdown-toc-cluster-mode">Cluster mode</a></li>
+  <li><a href="#connecting-spark-to-mesos">Connecting Spark to Mesos</a>    <ul>
+      <li><a href="#uploading-spark-package">Uploading Spark Package</a></li>
+      <li><a href="#using-a-mesos-master-url">Using a Mesos Master URL</a></li>
+      <li><a href="#client-mode">Client Mode</a></li>
+      <li><a href="#cluster-mode">Cluster mode</a></li>
     </ul>
   </li>
-  <li><a href="#mesos-run-modes" id="markdown-toc-mesos-run-modes">Mesos Run Modes</a>    <ul>
-      <li><a href="#coarse-grained" id="markdown-toc-coarse-grained">Coarse-Grained</a></li>
-      <li><a href="#fine-grained-deprecated" id="markdown-toc-fine-grained-deprecated">Fine-Grained (deprecated)</a></li>
+  <li><a href="#mesos-run-modes">Mesos Run Modes</a>    <ul>
+      <li><a href="#coarse-grained">Coarse-Grained</a></li>
+      <li><a href="#fine-grained-deprecated">Fine-Grained (deprecated)</a></li>
     </ul>
   </li>
-  <li><a href="#mesos-docker-support" id="markdown-toc-mesos-docker-support">Mesos Docker Support</a></li>
-  <li><a href="#running-alongside-hadoop" id="markdown-toc-running-alongside-hadoop">Running Alongside Hadoop</a></li>
-  <li><a href="#dynamic-resource-allocation-with-mesos" id="markdown-toc-dynamic-resource-allocation-with-mesos">Dynamic Resource Allocation with Mesos</a></li>
-  <li><a href="#configuration" id="markdown-toc-configuration">Configuration</a>    <ul>
-      <li><a href="#spark-properties" id="markdown-toc-spark-properties">Spark Properties</a></li>
+  <li><a href="#mesos-docker-support">Mesos Docker Support</a></li>
+  <li><a href="#running-alongside-hadoop">Running Alongside Hadoop</a></li>
+  <li><a href="#dynamic-resource-allocation-with-mesos">Dynamic Resource Allocation with Mesos</a></li>
+  <li><a href="#configuration">Configuration</a>    <ul>
+      <li><a href="#spark-properties">Spark Properties</a></li>
     </ul>
   </li>
-  <li><a href="#troubleshooting-and-debugging" id="markdown-toc-troubleshooting-and-debugging">Troubleshooting and Debugging</a></li>
+  <li><a href="#troubleshooting-and-debugging">Troubleshooting and Debugging</a></li>
 </ul>
 
 <p>Spark can run on hardware clusters managed by <a href="http://mesos.apache.org/">Apache Mesos</a>.</p>
@@ -289,11 +289,11 @@ instructions above. On Mac OS X, the library is called <code>libmesos.dylib</cod
 <p>Now when starting a Spark application against the cluster, pass a <code>mesos://</code>
 URL as the master when creating a <code>SparkContext</code>. For example:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">()</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">()</span>
   <span class="o">.</span><span class="n">setMaster</span><span class="o">(</span><span class="s">&quot;mesos://HOST:5050&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="n">setAppName</span><span class="o">(</span><span class="s">&quot;My app&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="n">set</span><span class="o">(</span><span class="s">&quot;spark.executor.uri&quot;</span><span class="o">,</span> <span class="s">&quot;&lt;path to spark-2.1.0.tar.gz uploaded above&gt;&quot;</span><span class="o">)</span>
-<span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></figure>
 
 <p>(You can also use <a href="submitting-applications.html"><code>spark-submit</code></a> and configure <code>spark.executor.uri</code>
 in the <a href="configuration.html#loading-default-configurations">conf/spark-defaults.conf</a> file.)</p>
@@ -301,7 +301,7 @@ in the <a href="configuration.html#loading-default-configurations">conf/spark-de
 <p>When running a shell, the <code>spark.executor.uri</code> parameter is inherited from <code>SPARK_EXECUTOR_URI</code>, so
 it does not need to be redundantly passed in as a system property.</p>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash">./bin/spark-shell --master mesos://host:5050</code></pre></div>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>./bin/spark-shell --master mesos://host:5050</code></pre></figure>
 
 <h2 id="cluster-mode">Cluster mode</h2>
 
@@ -322,7 +322,7 @@ Spark cluster Web UI.</p>
 
 <p>For example:</p>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash">./bin/spark-submit <span class="se">\</span>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>./bin/spark-submit <span class="se">\</span>
   --class org.apache.spark.examples.SparkPi <span class="se">\</span>
   --master mesos://207.184.161.138:7077 <span class="se">\</span>
   --deploy-mode cluster <span class="se">\</span>
@@ -330,7 +330,7 @@ Spark cluster Web UI.</p>
   --executor-memory 20G <span class="se">\</span>
   --total-executor-cores <span class="m">100</span> <span class="se">\</span>
   http://path/to/examples.jar <span class="se">\</span>
-  1000</code></pre></div>
+  <span class="m">1000</span></code></pre></figure>
 
 <p>Note that jars or python files that are passed to spark-submit should be URIs reachable by Mesos slaves, as the Spark driver doesn&#8217;t automatically upload local jars.</p>
 
@@ -404,13 +404,13 @@ terminate when they&#8217;re idle.</p>
 <p>To run in fine-grained mode, set the <code>spark.mesos.coarse</code> property to false in your
 <a href="configuration.html#spark-properties">SparkConf</a>:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="o">(</span><span class="s">&quot;spark.mesos.coarse&quot;</span><span class="o">,</span> <span class="s">&quot;false&quot;</span><span class="o">)</span></code></pre></div>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="o">(</span><span class="s">&quot;spark.mesos.coarse&quot;</span><span class="o">,</span> <span class="s">&quot;false&quot;</span><span class="o">)</span></code></pre></figure>
 
 <p>You may also make use of <code>spark.mesos.constraints</code> to set
 attribute-based constraints on Mesos resource offers. By default, all
 resource offers will be accepted.</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="o">(</span><span class="s">&quot;spark.mesos.constraints&quot;</span><span class="o">,</span> <span class="s">&quot;os:centos7;us-east-1:false&quot;</span><span class="o">)</span></code></pre></div>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">conf</span><span class="o">.</span><span class="n">set</span><span class="o">(</span><span class="s">&quot;spark.mesos.constraints&quot;</span><span class="o">,</span> <span class="s">&quot;os:centos7;us-east-1:false&quot;</span><span class="o">)</span></code></pre></figure>
 
 <p>For example, Let&#8217;s say <code>spark.mesos.constraints</code> is set to <code>os:centos7;us-east-1:false</code>, then the resource offers will be checked to see if they meet both these constraints and only then will be accepted to start new executors.</p>
 

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/running-on-yarn.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/running-on-yarn.html b/site/docs/2.1.0/running-on-yarn.html
index 1235965..5a3a8f5 100644
--- a/site/docs/2.1.0/running-on-yarn.html
+++ b/site/docs/2.1.0/running-on-yarn.html
@@ -629,9 +629,8 @@ includes a URI of the metadata store in <code>"hive.metastore.uris</code>, and
 the tokens needed to access these clusters must be explicitly requested at
 launch time. This is done by listing them in the <code>spark.yarn.access.namenodes</code> property.</p>
 
-<p><code>
-spark.yarn.access.namenodes hdfs://ireland.example.org:8020/,hdfs://frankfurt.example.org:8020/
-</code></p>
+<pre><code>spark.yarn.access.namenodes hdfs://ireland.example.org:8020/,hdfs://frankfurt.example.org:8020/
+</code></pre>
 
 <p>Spark supports integrating with other security-aware services through Java Services mechanism (see
 <code>java.util.ServiceLoader</code>). To do that, implementations of <code>org.apache.spark.deploy.yarn.security.ServiceCredentialProvider</code>
@@ -656,7 +655,7 @@ pre-packaged distribution.</li>
 then set <code>yarn.nodemanager.aux-services.spark_shuffle.class</code> to
 <code>org.apache.spark.network.yarn.YarnShuffleService</code>.</li>
   <li>Increase <code>NodeManager's</code> heap size by setting <code>YARN_HEAPSIZE</code> (1000 by default) in <code>etc/hadoop/yarn-env.sh</code> 
-to avoid garbage collection issues during shuffle.</li>
+to avoid garbage collection issues during shuffle. </li>
   <li>Restart all <code>NodeManager</code>s in your cluster.</li>
 </ol>
 
@@ -704,10 +703,9 @@ the Spark configuration must be set to disable token collection for the services
 
 <p>The Spark configuration must include the lines:</p>
 
-<p><code>
-spark.yarn.security.tokens.hive.enabled   false
+<pre><code>spark.yarn.security.tokens.hive.enabled   false
 spark.yarn.security.tokens.hbase.enabled  false
-</code></p>
+</code></pre>
 
 <p>The configuration option <code>spark.yarn.access.namenodes</code> must be unset.</p>
 
@@ -717,24 +715,21 @@ spark.yarn.security.tokens.hbase.enabled  false
 enable extra logging of Kerberos operations in Hadoop by setting the <code>HADOOP_JAAS_DEBUG</code>
 environment variable.</p>
 
-<p><code>bash
-export HADOOP_JAAS_DEBUG=true
-</code></p>
+<pre><code class="language-bash">export HADOOP_JAAS_DEBUG=true
+</code></pre>
 
 <p>The JDK classes can be configured to enable extra logging of their Kerberos and
 SPNEGO/REST authentication via the system properties <code>sun.security.krb5.debug</code>
 and <code>sun.security.spnego.debug=true</code></p>
 
-<p><code>
--Dsun.security.krb5.debug=true -Dsun.security.spnego.debug=true
-</code></p>
+<pre><code>-Dsun.security.krb5.debug=true -Dsun.security.spnego.debug=true
+</code></pre>
 
 <p>All these options can be enabled in the Application Master:</p>
 
-<p><code>
-spark.yarn.appMasterEnv.HADOOP_JAAS_DEBUG true
+<pre><code>spark.yarn.appMasterEnv.HADOOP_JAAS_DEBUG true
 spark.yarn.am.extraJavaOptions -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug=true
-</code></p>
+</code></pre>
 
 <p>Finally, if the log level for <code>org.apache.spark.deploy.yarn.Client</code> is set to <code>DEBUG</code>, the log
 will include a list of all tokens obtained, and their expiry details</p>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/spark-standalone.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/spark-standalone.html b/site/docs/2.1.0/spark-standalone.html
index 1caf5e4..4900198 100644
--- a/site/docs/2.1.0/spark-standalone.html
+++ b/site/docs/2.1.0/spark-standalone.html
@@ -127,18 +127,18 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#installing-spark-standalone-to-a-cluster" id="markdown-toc-installing-spark-standalone-to-a-cluster">Installing Spark Standalone to a Cluster</a></li>
-  <li><a href="#starting-a-cluster-manually" id="markdown-toc-starting-a-cluster-manually">Starting a Cluster Manually</a></li>
-  <li><a href="#cluster-launch-scripts" id="markdown-toc-cluster-launch-scripts">Cluster Launch Scripts</a></li>
-  <li><a href="#connecting-an-application-to-the-cluster" id="markdown-toc-connecting-an-application-to-the-cluster">Connecting an Application to the Cluster</a></li>
-  <li><a href="#launching-spark-applications" id="markdown-toc-launching-spark-applications">Launching Spark Applications</a></li>
-  <li><a href="#resource-scheduling" id="markdown-toc-resource-scheduling">Resource Scheduling</a></li>
-  <li><a href="#monitoring-and-logging" id="markdown-toc-monitoring-and-logging">Monitoring and Logging</a></li>
-  <li><a href="#running-alongside-hadoop" id="markdown-toc-running-alongside-hadoop">Running Alongside Hadoop</a></li>
-  <li><a href="#configuring-ports-for-network-security" id="markdown-toc-configuring-ports-for-network-security">Configuring Ports for Network Security</a></li>
-  <li><a href="#high-availability" id="markdown-toc-high-availability">High Availability</a>    <ul>
-      <li><a href="#standby-masters-with-zookeeper" id="markdown-toc-standby-masters-with-zookeeper">Standby Masters with ZooKeeper</a></li>
-      <li><a href="#single-node-recovery-with-local-file-system" id="markdown-toc-single-node-recovery-with-local-file-system">Single-Node Recovery with Local File System</a></li>
+  <li><a href="#installing-spark-standalone-to-a-cluster">Installing Spark Standalone to a Cluster</a></li>
+  <li><a href="#starting-a-cluster-manually">Starting a Cluster Manually</a></li>
+  <li><a href="#cluster-launch-scripts">Cluster Launch Scripts</a></li>
+  <li><a href="#connecting-an-application-to-the-cluster">Connecting an Application to the Cluster</a></li>
+  <li><a href="#launching-spark-applications">Launching Spark Applications</a></li>
+  <li><a href="#resource-scheduling">Resource Scheduling</a></li>
+  <li><a href="#monitoring-and-logging">Monitoring and Logging</a></li>
+  <li><a href="#running-alongside-hadoop">Running Alongside Hadoop</a></li>
+  <li><a href="#configuring-ports-for-network-security">Configuring Ports for Network Security</a></li>
+  <li><a href="#high-availability">High Availability</a>    <ul>
+      <li><a href="#standby-masters-with-zookeeper">Standby Masters with ZooKeeper</a></li>
+      <li><a href="#single-node-recovery-with-local-file-system">Single-Node Recovery with Local File System</a></li>
     </ul>
   </li>
 </ul>
@@ -446,17 +446,17 @@ By default, it will acquire <em>all</em> cores in the cluster, which only makes
 application at a time. You can cap the number of cores by setting <code>spark.cores.max</code> in your
 <a href="configuration.html#spark-properties">SparkConf</a>. For example:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">()</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">()</span>
   <span class="o">.</span><span class="n">setMaster</span><span class="o">(...)</span>
   <span class="o">.</span><span class="n">setAppName</span><span class="o">(...)</span>
   <span class="o">.</span><span class="n">set</span><span class="o">(</span><span class="s">&quot;spark.cores.max&quot;</span><span class="o">,</span> <span class="s">&quot;10&quot;</span><span class="o">)</span>
-<span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></figure>
 
 <p>In addition, you can configure <code>spark.deploy.defaultCores</code> on the cluster master process to change the
 default for applications that don&#8217;t set <code>spark.cores.max</code> to something less than infinite.
 Do this by adding the following to <code>conf/spark-env.sh</code>:</p>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nb">export </span><span class="nv">SPARK_MASTER_OPTS</span><span class="o">=</span><span class="s2">&quot;-Dspark.deploy.defaultCores=&lt;value&gt;&quot;</span></code></pre></div>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="nb">export</span> <span class="nv">SPARK_MASTER_OPTS</span><span class="o">=</span><span class="s2">&quot;-Dspark.deploy.defaultCores=&lt;value&gt;&quot;</span></code></pre></figure>
 
 <p>This is useful on shared clusters where users might not have configured a maximum number of cores
 individually.</p>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[03/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/structured-streaming-kafka-integration.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/structured-streaming-kafka-integration.html b/site/docs/2.1.0/structured-streaming-kafka-integration.html
index 5ca9259..7d2254f 100644
--- a/site/docs/2.1.0/structured-streaming-kafka-integration.html
+++ b/site/docs/2.1.0/structured-streaming-kafka-integration.html
@@ -144,7 +144,7 @@ application. See the <a href="#deploying">Deploying</a> subsection below.</p>
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Subscribe to 1 topic</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// Subscribe to 1 topic</span>
 <span class="k">val</span> <span class="n">ds1</span> <span class="k">=</span> <span class="n">spark</span>
   <span class="o">.</span><span class="n">readStream</span>
   <span class="o">.</span><span class="n">format</span><span class="o">(</span><span class="s">&quot;kafka&quot;</span><span class="o">)</span>
@@ -172,12 +172,12 @@ application. See the <a href="#deploying">Deploying</a> subsection below.</p>
   <span class="o">.</span><span class="n">option</span><span class="o">(</span><span class="s">&quot;subscribePattern&quot;</span><span class="o">,</span> <span class="s">&quot;topic.*&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="n">load</span><span class="o">()</span>
 <span class="n">ds3</span><span class="o">.</span><span class="n">selectExpr</span><span class="o">(</span><span class="s">&quot;CAST(key AS STRING)&quot;</span><span class="o">,</span> <span class="s">&quot;CAST(value AS STRING)&quot;</span><span class="o">)</span>
-  <span class="o">.</span><span class="n">as</span><span class="o">[(</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">)]</span></code></pre></div>
+  <span class="o">.</span><span class="n">as</span><span class="o">[(</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">)]</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Subscribe to 1 topic</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// Subscribe to 1 topic</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">ds1</span> <span class="o">=</span> <span class="n">spark</span>
   <span class="o">.</span><span class="na">readStream</span><span class="o">()</span>
   <span class="o">.</span><span class="na">format</span><span class="o">(</span><span class="s">&quot;kafka&quot;</span><span class="o">)</span>
@@ -202,43 +202,43 @@ application. See the <a href="#deploying">Deploying</a> subsection below.</p>
   <span class="o">.</span><span class="na">option</span><span class="o">(</span><span class="s">&quot;kafka.bootstrap.servers&quot;</span><span class="o">,</span> <span class="s">&quot;host1:port1,host2:port2&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">option</span><span class="o">(</span><span class="s">&quot;subscribePattern&quot;</span><span class="o">,</span> <span class="s">&quot;topic.*&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">load</span><span class="o">()</span>
-<span class="n">ds3</span><span class="o">.</span><span class="na">selectExpr</span><span class="o">(</span><span class="s">&quot;CAST(key AS STRING)&quot;</span><span class="o">,</span> <span class="s">&quot;CAST(value AS STRING)&quot;</span><span class="o">)</span></code></pre></div>
+<span class="n">ds3</span><span class="o">.</span><span class="na">selectExpr</span><span class="o">(</span><span class="s">&quot;CAST(key AS STRING)&quot;</span><span class="o">,</span> <span class="s">&quot;CAST(value AS STRING)&quot;</span><span class="o">)</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># Subscribe to 1 topic</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="c1"># Subscribe to 1 topic</span>
 <span class="n">ds1</span> <span class="o">=</span> <span class="n">spark</span>
   <span class="o">.</span><span class="n">readStream</span><span class="p">()</span>
-  <span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;kafka&quot;</span><span class="p">)</span>
-  <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s">&quot;kafka.bootstrap.servers&quot;</span><span class="p">,</span> <span class="s">&quot;host1:port1,host2:port2&quot;</span><span class="p">)</span>
-  <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s">&quot;subscribe&quot;</span><span class="p">,</span> <span class="s">&quot;topic1&quot;</span><span class="p">)</span>
+  <span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;kafka&quot;</span><span class="p">)</span>
+  <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;kafka.bootstrap.servers&quot;</span><span class="p">,</span> <span class="s2">&quot;host1:port1,host2:port2&quot;</span><span class="p">)</span>
+  <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;subscribe&quot;</span><span class="p">,</span> <span class="s2">&quot;topic1&quot;</span><span class="p">)</span>
   <span class="o">.</span><span class="n">load</span><span class="p">()</span>
-<span class="n">ds1</span><span class="o">.</span><span class="n">selectExpr</span><span class="p">(</span><span class="s">&quot;CAST(key AS STRING)&quot;</span><span class="p">,</span> <span class="s">&quot;CAST(value AS STRING)&quot;</span><span class="p">)</span>
+<span class="n">ds1</span><span class="o">.</span><span class="n">selectExpr</span><span class="p">(</span><span class="s2">&quot;CAST(key AS STRING)&quot;</span><span class="p">,</span> <span class="s2">&quot;CAST(value AS STRING)&quot;</span><span class="p">)</span>
 
-<span class="c"># Subscribe to multiple topics</span>
+<span class="c1"># Subscribe to multiple topics</span>
 <span class="n">ds2</span> <span class="o">=</span> <span class="n">spark</span>
   <span class="o">.</span><span class="n">readStream</span>
-  <span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;kafka&quot;</span><span class="p">)</span>
-  <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s">&quot;kafka.bootstrap.servers&quot;</span><span class="p">,</span> <span class="s">&quot;host1:port1,host2:port2&quot;</span><span class="p">)</span>
-  <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s">&quot;subscribe&quot;</span><span class="p">,</span> <span class="s">&quot;topic1,topic2&quot;</span><span class="p">)</span>
+  <span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;kafka&quot;</span><span class="p">)</span>
+  <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;kafka.bootstrap.servers&quot;</span><span class="p">,</span> <span class="s2">&quot;host1:port1,host2:port2&quot;</span><span class="p">)</span>
+  <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;subscribe&quot;</span><span class="p">,</span> <span class="s2">&quot;topic1,topic2&quot;</span><span class="p">)</span>
   <span class="o">.</span><span class="n">load</span><span class="p">()</span>
-<span class="n">ds2</span><span class="o">.</span><span class="n">selectExpr</span><span class="p">(</span><span class="s">&quot;CAST(key AS STRING)&quot;</span><span class="p">,</span> <span class="s">&quot;CAST(value AS STRING)&quot;</span><span class="p">)</span>
+<span class="n">ds2</span><span class="o">.</span><span class="n">selectExpr</span><span class="p">(</span><span class="s2">&quot;CAST(key AS STRING)&quot;</span><span class="p">,</span> <span class="s2">&quot;CAST(value AS STRING)&quot;</span><span class="p">)</span>
 
-<span class="c"># Subscribe to a pattern</span>
+<span class="c1"># Subscribe to a pattern</span>
 <span class="n">ds3</span> <span class="o">=</span> <span class="n">spark</span>
   <span class="o">.</span><span class="n">readStream</span><span class="p">()</span>
-  <span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;kafka&quot;</span><span class="p">)</span>
-  <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s">&quot;kafka.bootstrap.servers&quot;</span><span class="p">,</span> <span class="s">&quot;host1:port1,host2:port2&quot;</span><span class="p">)</span>
-  <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s">&quot;subscribePattern&quot;</span><span class="p">,</span> <span class="s">&quot;topic.*&quot;</span><span class="p">)</span>
+  <span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;kafka&quot;</span><span class="p">)</span>
+  <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;kafka.bootstrap.servers&quot;</span><span class="p">,</span> <span class="s2">&quot;host1:port1,host2:port2&quot;</span><span class="p">)</span>
+  <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;subscribePattern&quot;</span><span class="p">,</span> <span class="s2">&quot;topic.*&quot;</span><span class="p">)</span>
   <span class="o">.</span><span class="n">load</span><span class="p">()</span>
-<span class="n">ds3</span><span class="o">.</span><span class="n">selectExpr</span><span class="p">(</span><span class="s">&quot;CAST(key AS STRING)&quot;</span><span class="p">,</span> <span class="s">&quot;CAST(value AS STRING)&quot;</span><span class="p">)</span></code></pre></div>
+<span class="n">ds3</span><span class="o">.</span><span class="n">selectExpr</span><span class="p">(</span><span class="s2">&quot;CAST(key AS STRING)&quot;</span><span class="p">,</span> <span class="s2">&quot;CAST(value AS STRING)&quot;</span><span class="p">)</span></code></pre></figure>
 
   </div>
 </div>
 
-<p>Each row in the source has the following schema:</p>
-<table class="table">
+<p>Each row in the source has the following schema:
+&lt;table class="table"&gt;</p>
 <tr><th>Column</th><th>Type</th></tr>
 <tr>
   <td>key</td>
@@ -268,7 +268,7 @@ application. See the <a href="#deploying">Deploying</a> subsection below.</p>
   <td>timestampType</td>
   <td>int</td>
 </tr>
-</table>
+<p>&lt;/table&gt;</p>
 
 <p>The following options must be set for the Kafka source.</p>
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[19/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/ml-migration-guides.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/ml-migration-guides.html b/site/docs/2.1.0/ml-migration-guides.html
index 5e8a913..24dfc31 100644
--- a/site/docs/2.1.0/ml-migration-guides.html
+++ b/site/docs/2.1.0/ml-migration-guides.html
@@ -344,21 +344,21 @@ for converting to <code>mllib.linalg</code> types.</p>
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span>
 
 <span class="c1">// convert DataFrame columns</span>
 <span class="k">val</span> <span class="n">convertedVecDF</span> <span class="k">=</span> <span class="nc">MLUtils</span><span class="o">.</span><span class="n">convertVectorColumnsToML</span><span class="o">(</span><span class="n">vecDF</span><span class="o">)</span>
 <span class="k">val</span> <span class="n">convertedMatrixDF</span> <span class="k">=</span> <span class="nc">MLUtils</span><span class="o">.</span><span class="n">convertMatrixColumnsToML</span><span class="o">(</span><span class="n">matrixDF</span><span class="o">)</span>
 <span class="c1">// convert a single vector or matrix</span>
 <span class="k">val</span> <span class="n">mlVec</span><span class="k">:</span> <span class="kt">org.apache.spark.ml.linalg.Vector</span> <span class="o">=</span> <span class="n">mllibVec</span><span class="o">.</span><span class="n">asML</span>
-<span class="k">val</span> <span class="n">mlMat</span><span class="k">:</span> <span class="kt">org.apache.spark.ml.linalg.Matrix</span> <span class="o">=</span> <span class="n">mllibMat</span><span class="o">.</span><span class="n">asML</span></code></pre></div>
+<span class="k">val</span> <span class="n">mlMat</span><span class="k">:</span> <span class="kt">org.apache.spark.ml.linalg.Matrix</span> <span class="o">=</span> <span class="n">mllibMat</span><span class="o">.</span><span class="n">asML</span></code></pre></figure>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.util.MLUtils$"><code>MLUtils</code> Scala docs</a> for further detail.</p>
   </div>
 
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.util.MLUtils</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.Dataset</span><span class="o">;</span>
 
 <span class="c1">// convert DataFrame columns</span>
@@ -366,21 +366,21 @@ for converting to <code>mllib.linalg</code> types.</p>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">convertedMatrixDF</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="na">convertMatrixColumnsToML</span><span class="o">(</span><span class="n">matrixDF</span><span class="o">);</span>
 <span class="c1">// convert a single vector or matrix</span>
 <span class="n">org</span><span class="o">.</span><span class="na">apache</span><span class="o">.</span><span class="na">spark</span><span class="o">.</span><span class="na">ml</span><span class="o">.</span><span class="na">linalg</span><span class="o">.</span><span class="na">Vector</span> <span class="n">mlVec</span> <span class="o">=</span> <span class="n">mllibVec</span><span class="o">.</span><span class="na">asML</span><span class="o">();</span>
-<span class="n">org</span><span class="o">.</span><span class="na">apache</span><span class="o">.</span><span class="na">spark</span><span class="o">.</span><span class="na">ml</span><span class="o">.</span><span class="na">linalg</span><span class="o">.</span><span class="na">Matrix</span> <span class="n">mlMat</span> <span class="o">=</span> <span class="n">mllibMat</span><span class="o">.</span><span class="na">asML</span><span class="o">();</span></code></pre></div>
+<span class="n">org</span><span class="o">.</span><span class="na">apache</span><span class="o">.</span><span class="na">spark</span><span class="o">.</span><span class="na">ml</span><span class="o">.</span><span class="na">linalg</span><span class="o">.</span><span class="na">Matrix</span> <span class="n">mlMat</span> <span class="o">=</span> <span class="n">mllibMat</span><span class="o">.</span><span class="na">asML</span><span class="o">();</span></code></pre></figure>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/util/MLUtils.html"><code>MLUtils</code> Java docs</a> for further detail.</p>
   </div>
 
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.util</span> <span class="kn">import</span> <span class="n">MLUtils</span>
 
-<span class="c"># convert DataFrame columns</span>
+<span class="c1"># convert DataFrame columns</span>
 <span class="n">convertedVecDF</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">convertVectorColumnsToML</span><span class="p">(</span><span class="n">vecDF</span><span class="p">)</span>
 <span class="n">convertedMatrixDF</span> <span class="o">=</span> <span class="n">MLUtils</span><span class="o">.</span><span class="n">convertMatrixColumnsToML</span><span class="p">(</span><span class="n">matrixDF</span><span class="p">)</span>
-<span class="c"># convert a single vector or matrix</span>
+<span class="c1"># convert a single vector or matrix</span>
 <span class="n">mlVec</span> <span class="o">=</span> <span class="n">mllibVec</span><span class="o">.</span><span class="n">asML</span><span class="p">()</span>
-<span class="n">mlMat</span> <span class="o">=</span> <span class="n">mllibMat</span><span class="o">.</span><span class="n">asML</span><span class="p">()</span></code></pre></div>
+<span class="n">mlMat</span> <span class="o">=</span> <span class="n">mllibMat</span><span class="o">.</span><span class="n">asML</span><span class="p">()</span></code></pre></figure>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.util.MLUtils"><code>MLUtils</code> Python docs</a> for further detail.</p>
   </div>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/ml-pipeline.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/ml-pipeline.html b/site/docs/2.1.0/ml-pipeline.html
index fe17564..b57afde 100644
--- a/site/docs/2.1.0/ml-pipeline.html
+++ b/site/docs/2.1.0/ml-pipeline.html
@@ -331,27 +331,27 @@ machine learning pipelines.</p>
 <p><strong>Table of Contents</strong></p>
 
 <ul id="markdown-toc">
-  <li><a href="#main-concepts-in-pipelines" id="markdown-toc-main-concepts-in-pipelines">Main concepts in Pipelines</a>    <ul>
-      <li><a href="#dataframe" id="markdown-toc-dataframe">DataFrame</a></li>
-      <li><a href="#pipeline-components" id="markdown-toc-pipeline-components">Pipeline components</a>        <ul>
-          <li><a href="#transformers" id="markdown-toc-transformers">Transformers</a></li>
-          <li><a href="#estimators" id="markdown-toc-estimators">Estimators</a></li>
-          <li><a href="#properties-of-pipeline-components" id="markdown-toc-properties-of-pipeline-components">Properties of pipeline components</a></li>
+  <li><a href="#main-concepts-in-pipelines">Main concepts in Pipelines</a>    <ul>
+      <li><a href="#dataframe">DataFrame</a></li>
+      <li><a href="#pipeline-components">Pipeline components</a>        <ul>
+          <li><a href="#transformers">Transformers</a></li>
+          <li><a href="#estimators">Estimators</a></li>
+          <li><a href="#properties-of-pipeline-components">Properties of pipeline components</a></li>
         </ul>
       </li>
-      <li><a href="#pipeline" id="markdown-toc-pipeline">Pipeline</a>        <ul>
-          <li><a href="#how-it-works" id="markdown-toc-how-it-works">How it works</a></li>
-          <li><a href="#details" id="markdown-toc-details">Details</a></li>
+      <li><a href="#pipeline">Pipeline</a>        <ul>
+          <li><a href="#how-it-works">How it works</a></li>
+          <li><a href="#details">Details</a></li>
         </ul>
       </li>
-      <li><a href="#parameters" id="markdown-toc-parameters">Parameters</a></li>
-      <li><a href="#saving-and-loading-pipelines" id="markdown-toc-saving-and-loading-pipelines">Saving and Loading Pipelines</a></li>
+      <li><a href="#parameters">Parameters</a></li>
+      <li><a href="#saving-and-loading-pipelines">Saving and Loading Pipelines</a></li>
     </ul>
   </li>
-  <li><a href="#code-examples" id="markdown-toc-code-examples">Code examples</a>    <ul>
-      <li><a href="#example-estimator-transformer-and-param" id="markdown-toc-example-estimator-transformer-and-param">Example: Estimator, Transformer, and Param</a></li>
-      <li><a href="#example-pipeline" id="markdown-toc-example-pipeline">Example: Pipeline</a></li>
-      <li><a href="#model-selection-hyperparameter-tuning" id="markdown-toc-model-selection-hyperparameter-tuning">Model selection (hyperparameter tuning)</a></li>
+  <li><a href="#code-examples">Code examples</a>    <ul>
+      <li><a href="#example-estimator-transformer-and-param">Example: Estimator, Transformer, and Param</a></li>
+      <li><a href="#example-pipeline">Example: Pipeline</a></li>
+      <li><a href="#model-selection-hyperparameter-tuning">Model selection (hyperparameter tuning)</a></li>
     </ul>
   </li>
 </ul>
@@ -541,7 +541,7 @@ Refer to the [`Estimator` Scala docs](api/scala/index.html#org.apache.spark.ml.E
 the [`Transformer` Scala docs](api/scala/index.html#org.apache.spark.ml.Transformer) and
 the [`Params` Scala docs](api/scala/index.html#org.apache.spark.ml.param.Params) for details on the API.
 
-<div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegression</span>
+<div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegression</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.linalg.</span><span class="o">{</span><span class="nc">Vector</span><span class="o">,</span> <span class="nc">Vectors</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.param.ParamMap</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.sql.Row</span>
@@ -601,7 +601,7 @@ the [`Params` Scala docs](api/scala/index.html#org.apache.spark.ml.param.Params)
   <span class="o">.</span><span class="n">select</span><span class="o">(</span><span class="s">&quot;features&quot;</span><span class="o">,</span> <span class="s">&quot;label&quot;</span><span class="o">,</span> <span class="s">&quot;myProbability&quot;</span><span class="o">,</span> <span class="s">&quot;prediction&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="n">collect</span><span class="o">()</span>
   <span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="k">case</span> <span class="nc">Row</span><span class="o">(</span><span class="n">features</span><span class="k">:</span> <span class="kt">Vector</span><span class="o">,</span> <span class="n">label</span><span class="k">:</span> <span class="kt">Double</span><span class="o">,</span> <span class="n">prob</span><span class="k">:</span> <span class="kt">Vector</span><span class="o">,</span> <span class="n">prediction</span><span class="k">:</span> <span class="kt">Double</span><span class="o">)</span> <span class="k">=&gt;</span>
-    <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;($features, $label) -&gt; prob=$prob, prediction=$prediction&quot;</span><span class="o">)</span>
+    <span class="n">println</span><span class="o">(</span><span class="s">s&quot;(</span><span class="si">$features</span><span class="s">, </span><span class="si">$label</span><span class="s">) -&gt; prob=</span><span class="si">$prob</span><span class="s">, prediction=</span><span class="si">$prediction</span><span class="s">&quot;</span><span class="o">)</span>
   <span class="o">}</span>
 </pre></div><div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/ml/EstimatorTransformerParamExample.scala" in the Spark repo.</small></div>
 </div>
@@ -612,7 +612,7 @@ Refer to the [`Estimator` Java docs](api/java/org/apache/spark/ml/Estimator.html
 the [`Transformer` Java docs](api/java/org/apache/spark/ml/Transformer.html) and
 the [`Params` Java docs](api/java/org/apache/spark/ml/param/Params.html) for details on the API.
 
-<div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">java.util.List</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegression</span><span class="o">;</span>
@@ -635,14 +635,14 @@ the [`Params` Java docs](api/java/org/apache/spark/ml/param/Params.html) for det
     <span class="n">RowFactory</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="mf">0.0</span><span class="o">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="mf">2.0</span><span class="o">,</span> <span class="mf">1.3</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">)),</span>
     <span class="n">RowFactory</span><span class="o">.</span><span class="na">create</span><span class="o">(</span><span class="mf">1.0</span><span class="o">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="na">dense</span><span class="o">(</span><span class="mf">0.0</span><span class="o">,</span> <span class="mf">1.2</span><span class="o">,</span> <span class="o">-</span><span class="mf">0.5</span><span class="o">))</span>
 <span class="o">);</span>
-<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span><span class="o">[]{</span>
-    <span class="k">new</span> <span class="nf">StructField</span><span class="o">(</span><span class="s">&quot;label&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">DoubleType</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">()),</span>
-    <span class="k">new</span> <span class="nf">StructField</span><span class="o">(</span><span class="s">&quot;features&quot;</span><span class="o">,</span> <span class="k">new</span> <span class="nf">VectorUDT</span><span class="o">(),</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">())</span>
+<span class="n">StructType</span> <span class="n">schema</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StructType</span><span class="o">(</span><span class="k">new</span> <span class="n">StructField</span><span class="o">[]{</span>
+    <span class="k">new</span> <span class="n">StructField</span><span class="o">(</span><span class="s">&quot;label&quot;</span><span class="o">,</span> <span class="n">DataTypes</span><span class="o">.</span><span class="na">DoubleType</span><span class="o">,</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">()),</span>
+    <span class="k">new</span> <span class="n">StructField</span><span class="o">(</span><span class="s">&quot;features&quot;</span><span class="o">,</span> <span class="k">new</span> <span class="n">VectorUDT</span><span class="o">(),</span> <span class="kc">false</span><span class="o">,</span> <span class="n">Metadata</span><span class="o">.</span><span class="na">empty</span><span class="o">())</span>
 <span class="o">});</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">training</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">createDataFrame</span><span class="o">(</span><span class="n">dataTraining</span><span class="o">,</span> <span class="n">schema</span><span class="o">);</span>
 
 <span class="c1">// Create a LogisticRegression instance. This instance is an Estimator.</span>
-<span class="n">LogisticRegression</span> <span class="n">lr</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LogisticRegression</span><span class="o">();</span>
+<span class="n">LogisticRegression</span> <span class="n">lr</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LogisticRegression</span><span class="o">();</span>
 <span class="c1">// Print out the parameters, documentation, and any default values.</span>
 <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;LogisticRegression parameters:\n&quot;</span> <span class="o">+</span> <span class="n">lr</span><span class="o">.</span><span class="na">explainParams</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot;\n&quot;</span><span class="o">);</span>
 
@@ -658,13 +658,13 @@ the [`Params` Java docs](api/java/org/apache/spark/ml/param/Params.html) for det
 <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;Model 1 was fit using parameters: &quot;</span> <span class="o">+</span> <span class="n">model1</span><span class="o">.</span><span class="na">parent</span><span class="o">().</span><span class="na">extractParamMap</span><span class="o">());</span>
 
 <span class="c1">// We may alternatively specify parameters using a ParamMap.</span>
-<span class="n">ParamMap</span> <span class="n">paramMap</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">ParamMap</span><span class="o">()</span>
+<span class="n">ParamMap</span> <span class="n">paramMap</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ParamMap</span><span class="o">()</span>
   <span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">lr</span><span class="o">.</span><span class="na">maxIter</span><span class="o">().</span><span class="na">w</span><span class="o">(</span><span class="mi">20</span><span class="o">))</span>  <span class="c1">// Specify 1 Param.</span>
   <span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">lr</span><span class="o">.</span><span class="na">maxIter</span><span class="o">(),</span> <span class="mi">30</span><span class="o">)</span>  <span class="c1">// This overwrites the original maxIter.</span>
   <span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">lr</span><span class="o">.</span><span class="na">regParam</span><span class="o">().</span><span class="na">w</span><span class="o">(</span><span class="mf">0.1</span><span class="o">),</span> <span class="n">lr</span><span class="o">.</span><span class="na">threshold</span><span class="o">().</span><span class="na">w</span><span class="o">(</span><span class="mf">0.55</span><span class="o">));</span>  <span class="c1">// Specify multiple Params.</span>
 
 <span class="c1">// One can also combine ParamMaps.</span>
-<span class="n">ParamMap</span> <span class="n">paramMap2</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">ParamMap</span><span class="o">()</span>
+<span class="n">ParamMap</span> <span class="n">paramMap2</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ParamMap</span><span class="o">()</span>
   <span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="n">lr</span><span class="o">.</span><span class="na">probabilityCol</span><span class="o">().</span><span class="na">w</span><span class="o">(</span><span class="s">&quot;myProbability&quot;</span><span class="o">));</span>  <span class="c1">// Change output column name</span>
 <span class="n">ParamMap</span> <span class="n">paramMapCombined</span> <span class="o">=</span> <span class="n">paramMap</span><span class="o">.</span><span class="na">$plus$plus</span><span class="o">(</span><span class="n">paramMap2</span><span class="o">);</span>
 
@@ -687,7 +687,7 @@ the [`Params` Java docs](api/java/org/apache/spark/ml/param/Params.html) for det
 <span class="c1">// &#39;probability&#39; column since we renamed the lr.probabilityCol parameter previously.</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">results</span> <span class="o">=</span> <span class="n">model2</span><span class="o">.</span><span class="na">transform</span><span class="o">(</span><span class="n">test</span><span class="o">);</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">rows</span> <span class="o">=</span> <span class="n">results</span><span class="o">.</span><span class="na">select</span><span class="o">(</span><span class="s">&quot;features&quot;</span><span class="o">,</span> <span class="s">&quot;label&quot;</span><span class="o">,</span> <span class="s">&quot;myProbability&quot;</span><span class="o">,</span> <span class="s">&quot;prediction&quot;</span><span class="o">);</span>
-<span class="k">for</span> <span class="o">(</span><span class="n">Row</span> <span class="nl">r:</span> <span class="n">rows</span><span class="o">.</span><span class="na">collectAsList</span><span class="o">())</span> <span class="o">{</span>
+<span class="k">for</span> <span class="o">(</span><span class="n">Row</span> <span class="n">r</span><span class="o">:</span> <span class="n">rows</span><span class="o">.</span><span class="na">collectAsList</span><span class="o">())</span> <span class="o">{</span>
   <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;(&quot;</span> <span class="o">+</span> <span class="n">r</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="mi">0</span><span class="o">)</span> <span class="o">+</span> <span class="s">&quot;, &quot;</span> <span class="o">+</span> <span class="n">r</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="mi">1</span><span class="o">)</span> <span class="o">+</span> <span class="s">&quot;) -&gt; prob=&quot;</span> <span class="o">+</span> <span class="n">r</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="mi">2</span><span class="o">)</span>
     <span class="o">+</span> <span class="s">&quot;, prediction=&quot;</span> <span class="o">+</span> <span class="n">r</span><span class="o">.</span><span class="na">get</span><span class="o">(</span><span class="mi">3</span><span class="o">));</span>
 <span class="o">}</span>
@@ -700,63 +700,63 @@ Refer to the [`Estimator` Python docs](api/python/pyspark.ml.html#pyspark.ml.Est
 the [`Transformer` Python docs](api/python/pyspark.ml.html#pyspark.ml.Transformer) and
 the [`Params` Python docs](api/python/pyspark.ml.html#pyspark.ml.param.Params) for more details on the API.
 
-<div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml.linalg</span> <span class="kn">import</span> <span class="n">Vectors</span>
+<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml.linalg</span> <span class="kn">import</span> <span class="n">Vectors</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.classification</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
 
-<span class="c"># Prepare training data from a list of (label, features) tuples.</span>
+<span class="c1"># Prepare training data from a list of (label, features) tuples.</span>
 <span class="n">training</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([</span>
     <span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.1</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">])),</span>
     <span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="mf">2.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="o">-</span><span class="mf">1.0</span><span class="p">])),</span>
     <span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="mf">2.0</span><span class="p">,</span> <span class="mf">1.3</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">])),</span>
-    <span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.2</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.5</span><span class="p">]))],</span> <span class="p">[</span><span class="s">&quot;label&quot;</span><span class="p">,</span> <span class="s">&quot;features&quot;</span><span class="p">])</span>
+    <span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.2</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.5</span><span class="p">]))],</span> <span class="p">[</span><span class="s2">&quot;label&quot;</span><span class="p">,</span> <span class="s2">&quot;features&quot;</span><span class="p">])</span>
 
-<span class="c"># Create a LogisticRegression instance. This instance is an Estimator.</span>
+<span class="c1"># Create a LogisticRegression instance. This instance is an Estimator.</span>
 <span class="n">lr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">(</span><span class="n">maxIter</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">regParam</span><span class="o">=</span><span class="mf">0.01</span><span class="p">)</span>
-<span class="c"># Print out the parameters, documentation, and any default values.</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;LogisticRegression parameters:</span><span class="se">\n</span><span class="s">&quot;</span> <span class="o">+</span> <span class="n">lr</span><span class="o">.</span><span class="n">explainParams</span><span class="p">()</span> <span class="o">+</span> <span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">)</span>
+<span class="c1"># Print out the parameters, documentation, and any default values.</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;LogisticRegression parameters:</span><span class="se">\n</span><span class="s2">&quot;</span> <span class="o">+</span> <span class="n">lr</span><span class="o">.</span><span class="n">explainParams</span><span class="p">()</span> <span class="o">+</span> <span class="s2">&quot;</span><span class="se">\n</span><span class="s2">&quot;</span><span class="p">)</span>
 
-<span class="c"># Learn a LogisticRegression model. This uses the parameters stored in lr.</span>
+<span class="c1"># Learn a LogisticRegression model. This uses the parameters stored in lr.</span>
 <span class="n">model1</span> <span class="o">=</span> <span class="n">lr</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
 
-<span class="c"># Since model1 is a Model (i.e., a transformer produced by an Estimator),</span>
-<span class="c"># we can view the parameters it used during fit().</span>
-<span class="c"># This prints the parameter (name: value) pairs, where names are unique IDs for this</span>
-<span class="c"># LogisticRegression instance.</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Model 1 was fit using parameters: &quot;</span><span class="p">)</span>
+<span class="c1"># Since model1 is a Model (i.e., a transformer produced by an Estimator),</span>
+<span class="c1"># we can view the parameters it used during fit().</span>
+<span class="c1"># This prints the parameter (name: value) pairs, where names are unique IDs for this</span>
+<span class="c1"># LogisticRegression instance.</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Model 1 was fit using parameters: &quot;</span><span class="p">)</span>
 <span class="k">print</span><span class="p">(</span><span class="n">model1</span><span class="o">.</span><span class="n">extractParamMap</span><span class="p">())</span>
 
-<span class="c"># We may alternatively specify parameters using a Python dictionary as a paramMap</span>
+<span class="c1"># We may alternatively specify parameters using a Python dictionary as a paramMap</span>
 <span class="n">paramMap</span> <span class="o">=</span> <span class="p">{</span><span class="n">lr</span><span class="o">.</span><span class="n">maxIter</span><span class="p">:</span> <span class="mi">20</span><span class="p">}</span>
-<span class="n">paramMap</span><span class="p">[</span><span class="n">lr</span><span class="o">.</span><span class="n">maxIter</span><span class="p">]</span> <span class="o">=</span> <span class="mi">30</span>  <span class="c"># Specify 1 Param, overwriting the original maxIter.</span>
-<span class="n">paramMap</span><span class="o">.</span><span class="n">update</span><span class="p">({</span><span class="n">lr</span><span class="o">.</span><span class="n">regParam</span><span class="p">:</span> <span class="mf">0.1</span><span class="p">,</span> <span class="n">lr</span><span class="o">.</span><span class="n">threshold</span><span class="p">:</span> <span class="mf">0.55</span><span class="p">})</span>  <span class="c"># Specify multiple Params.</span>
+<span class="n">paramMap</span><span class="p">[</span><span class="n">lr</span><span class="o">.</span><span class="n">maxIter</span><span class="p">]</span> <span class="o">=</span> <span class="mi">30</span>  <span class="c1"># Specify 1 Param, overwriting the original maxIter.</span>
+<span class="n">paramMap</span><span class="o">.</span><span class="n">update</span><span class="p">({</span><span class="n">lr</span><span class="o">.</span><span class="n">regParam</span><span class="p">:</span> <span class="mf">0.1</span><span class="p">,</span> <span class="n">lr</span><span class="o">.</span><span class="n">threshold</span><span class="p">:</span> <span class="mf">0.55</span><span class="p">})</span>  <span class="c1"># Specify multiple Params.</span>
 
-<span class="c"># You can combine paramMaps, which are python dictionaries.</span>
-<span class="n">paramMap2</span> <span class="o">=</span> <span class="p">{</span><span class="n">lr</span><span class="o">.</span><span class="n">probabilityCol</span><span class="p">:</span> <span class="s">&quot;myProbability&quot;</span><span class="p">}</span>  <span class="c"># Change output column name</span>
+<span class="c1"># You can combine paramMaps, which are python dictionaries.</span>
+<span class="n">paramMap2</span> <span class="o">=</span> <span class="p">{</span><span class="n">lr</span><span class="o">.</span><span class="n">probabilityCol</span><span class="p">:</span> <span class="s2">&quot;myProbability&quot;</span><span class="p">}</span>  <span class="c1"># Change output column name</span>
 <span class="n">paramMapCombined</span> <span class="o">=</span> <span class="n">paramMap</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
 <span class="n">paramMapCombined</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">paramMap2</span><span class="p">)</span>
 
-<span class="c"># Now learn a new model using the paramMapCombined parameters.</span>
-<span class="c"># paramMapCombined overrides all parameters set earlier via lr.set* methods.</span>
+<span class="c1"># Now learn a new model using the paramMapCombined parameters.</span>
+<span class="c1"># paramMapCombined overrides all parameters set earlier via lr.set* methods.</span>
 <span class="n">model2</span> <span class="o">=</span> <span class="n">lr</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">,</span> <span class="n">paramMapCombined</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Model 2 was fit using parameters: &quot;</span><span class="p">)</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Model 2 was fit using parameters: &quot;</span><span class="p">)</span>
 <span class="k">print</span><span class="p">(</span><span class="n">model2</span><span class="o">.</span><span class="n">extractParamMap</span><span class="p">())</span>
 
-<span class="c"># Prepare test data</span>
+<span class="c1"># Prepare test data</span>
 <span class="n">test</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([</span>
     <span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="o">-</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">,</span> <span class="mf">1.3</span><span class="p">])),</span>
     <span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="mf">3.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.1</span><span class="p">])),</span>
-    <span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">2.2</span><span class="p">,</span> <span class="o">-</span><span class="mf">1.5</span><span class="p">]))],</span> <span class="p">[</span><span class="s">&quot;label&quot;</span><span class="p">,</span> <span class="s">&quot;features&quot;</span><span class="p">])</span>
+    <span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">2.2</span><span class="p">,</span> <span class="o">-</span><span class="mf">1.5</span><span class="p">]))],</span> <span class="p">[</span><span class="s2">&quot;label&quot;</span><span class="p">,</span> <span class="s2">&quot;features&quot;</span><span class="p">])</span>
 
-<span class="c"># Make predictions on test data using the Transformer.transform() method.</span>
-<span class="c"># LogisticRegression.transform will only use the &#39;features&#39; column.</span>
-<span class="c"># Note that model2.transform() outputs a &quot;myProbability&quot; column instead of the usual</span>
-<span class="c"># &#39;probability&#39; column since we renamed the lr.probabilityCol parameter previously.</span>
+<span class="c1"># Make predictions on test data using the Transformer.transform() method.</span>
+<span class="c1"># LogisticRegression.transform will only use the &#39;features&#39; column.</span>
+<span class="c1"># Note that model2.transform() outputs a &quot;myProbability&quot; column instead of the usual</span>
+<span class="c1"># &#39;probability&#39; column since we renamed the lr.probabilityCol parameter previously.</span>
 <span class="n">prediction</span> <span class="o">=</span> <span class="n">model2</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>
-<span class="n">result</span> <span class="o">=</span> <span class="n">prediction</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&quot;features&quot;</span><span class="p">,</span> <span class="s">&quot;label&quot;</span><span class="p">,</span> <span class="s">&quot;myProbability&quot;</span><span class="p">,</span> <span class="s">&quot;prediction&quot;</span><span class="p">)</span> \
+<span class="n">result</span> <span class="o">=</span> <span class="n">prediction</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">&quot;features&quot;</span><span class="p">,</span> <span class="s2">&quot;label&quot;</span><span class="p">,</span> <span class="s2">&quot;myProbability&quot;</span><span class="p">,</span> <span class="s2">&quot;prediction&quot;</span><span class="p">)</span> \
     <span class="o">.</span><span class="n">collect</span><span class="p">()</span>
 
 <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">result</span><span class="p">:</span>
-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;features=</span><span class="si">%s</span><span class="s">, label=</span><span class="si">%s</span><span class="s"> -&gt; prob=</span><span class="si">%s</span><span class="s">, prediction=</span><span class="si">%s</span><span class="s">&quot;</span>
+    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;features=</span><span class="si">%s</span><span class="s2">, label=</span><span class="si">%s</span><span class="s2"> -&gt; prob=</span><span class="si">%s</span><span class="s2">, prediction=</span><span class="si">%s</span><span class="s2">&quot;</span>
           <span class="o">%</span> <span class="p">(</span><span class="n">row</span><span class="o">.</span><span class="n">features</span><span class="p">,</span> <span class="n">row</span><span class="o">.</span><span class="n">label</span><span class="p">,</span> <span class="n">row</span><span class="o">.</span><span class="n">myProbability</span><span class="p">,</span> <span class="n">row</span><span class="o">.</span><span class="n">prediction</span><span class="p">))</span>
 </pre></div><div><small>Find full example code at "examples/src/main/python/ml/estimator_transformer_param_example.py" in the Spark repo.</small></div>
 </div>
@@ -773,7 +773,7 @@ the [`Params` Python docs](api/python/pyspark.ml.html#pyspark.ml.param.Params) f
 
 Refer to the [`Pipeline` Scala docs](api/scala/index.html#org.apache.spark.ml.Pipeline) for details on the API.
 
-<div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.ml.</span><span class="o">{</span><span class="nc">Pipeline</span><span class="o">,</span> <span class="nc">PipelineModel</span><span class="o">}</span>
+<div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.ml.</span><span class="o">{</span><span class="nc">Pipeline</span><span class="o">,</span> <span class="nc">PipelineModel</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.classification.LogisticRegression</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.feature.</span><span class="o">{</span><span class="nc">HashingTF</span><span class="o">,</span> <span class="nc">Tokenizer</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.ml.linalg.Vector</span>
@@ -826,7 +826,7 @@ Refer to the [`Pipeline` Scala docs](api/scala/index.html#org.apache.spark.ml.Pi
   <span class="o">.</span><span class="n">select</span><span class="o">(</span><span class="s">&quot;id&quot;</span><span class="o">,</span> <span class="s">&quot;text&quot;</span><span class="o">,</span> <span class="s">&quot;probability&quot;</span><span class="o">,</span> <span class="s">&quot;prediction&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="n">collect</span><span class="o">()</span>
   <span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="k">case</span> <span class="nc">Row</span><span class="o">(</span><span class="n">id</span><span class="k">:</span> <span class="kt">Long</span><span class="o">,</span> <span class="n">text</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">prob</span><span class="k">:</span> <span class="kt">Vector</span><span class="o">,</span> <span class="n">prediction</span><span class="k">:</span> <span class="kt">Double</span><span class="o">)</span> <span class="k">=&gt;</span>
-    <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;($id, $text) --&gt; prob=$prob, prediction=$prediction&quot;</span><span class="o">)</span>
+    <span class="n">println</span><span class="o">(</span><span class="s">s&quot;(</span><span class="si">$id</span><span class="s">, </span><span class="si">$text</span><span class="s">) --&gt; prob=</span><span class="si">$prob</span><span class="s">, prediction=</span><span class="si">$prediction</span><span class="s">&quot;</span><span class="o">)</span>
   <span class="o">}</span>
 </pre></div><div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/ml/PipelineExample.scala" in the Spark repo.</small></div>
 </div>
@@ -836,7 +836,7 @@ Refer to the [`Pipeline` Scala docs](api/scala/index.html#org.apache.spark.ml.Pi
 
 Refer to the [`Pipeline` Java docs](api/java/org/apache/spark/ml/Pipeline.html) for details on the API.
 
-<div class="highlight"><pre><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
+<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">java.util.Arrays</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.Pipeline</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.ml.PipelineModel</span><span class="o">;</span>
@@ -849,24 +849,24 @@ Refer to the [`Pipeline` Java docs](api/java/org/apache/spark/ml/Pipeline.html)
 
 <span class="c1">// Prepare training documents, which are labeled.</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">training</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">createDataFrame</span><span class="o">(</span><span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">0L</span><span class="o">,</span> <span class="s">&quot;a b c d e spark&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">1L</span><span class="o">,</span> <span class="s">&quot;b d&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">2L</span><span class="o">,</span> <span class="s">&quot;spark f g h&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaLabeledDocument</span><span class="o">(</span><span class="mi">3L</span><span class="o">,</span> <span class="s">&quot;hadoop mapreduce&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">)</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">0</span><span class="n">L</span><span class="o">,</span> <span class="s">&quot;a b c d e spark&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">1L</span><span class="o">,</span> <span class="s">&quot;b d&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">2L</span><span class="o">,</span> <span class="s">&quot;spark f g h&quot;</span><span class="o">,</span> <span class="mf">1.0</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaLabeledDocument</span><span class="o">(</span><span class="mi">3L</span><span class="o">,</span> <span class="s">&quot;hadoop mapreduce&quot;</span><span class="o">,</span> <span class="mf">0.0</span><span class="o">)</span>
 <span class="o">),</span> <span class="n">JavaLabeledDocument</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
 
 <span class="c1">// Configure an ML pipeline, which consists of three stages: tokenizer, hashingTF, and lr.</span>
-<span class="n">Tokenizer</span> <span class="n">tokenizer</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Tokenizer</span><span class="o">()</span>
+<span class="n">Tokenizer</span> <span class="n">tokenizer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Tokenizer</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="s">&quot;text&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;words&quot;</span><span class="o">);</span>
-<span class="n">HashingTF</span> <span class="n">hashingTF</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">HashingTF</span><span class="o">()</span>
+<span class="n">HashingTF</span> <span class="n">hashingTF</span> <span class="o">=</span> <span class="k">new</span> <span class="n">HashingTF</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setNumFeatures</span><span class="o">(</span><span class="mi">1000</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setInputCol</span><span class="o">(</span><span class="n">tokenizer</span><span class="o">.</span><span class="na">getOutputCol</span><span class="o">())</span>
   <span class="o">.</span><span class="na">setOutputCol</span><span class="o">(</span><span class="s">&quot;features&quot;</span><span class="o">);</span>
-<span class="n">LogisticRegression</span> <span class="n">lr</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LogisticRegression</span><span class="o">()</span>
+<span class="n">LogisticRegression</span> <span class="n">lr</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LogisticRegression</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setMaxIter</span><span class="o">(</span><span class="mi">10</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setRegParam</span><span class="o">(</span><span class="mf">0.001</span><span class="o">);</span>
-<span class="n">Pipeline</span> <span class="n">pipeline</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">Pipeline</span><span class="o">()</span>
+<span class="n">Pipeline</span> <span class="n">pipeline</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Pipeline</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setStages</span><span class="o">(</span><span class="k">new</span> <span class="n">PipelineStage</span><span class="o">[]</span> <span class="o">{</span><span class="n">tokenizer</span><span class="o">,</span> <span class="n">hashingTF</span><span class="o">,</span> <span class="n">lr</span><span class="o">});</span>
 
 <span class="c1">// Fit the pipeline to training documents.</span>
@@ -874,10 +874,10 @@ Refer to the [`Pipeline` Java docs](api/java/org/apache/spark/ml/Pipeline.html)
 
 <span class="c1">// Prepare test documents, which are unlabeled.</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">test</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">createDataFrame</span><span class="o">(</span><span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span>
-  <span class="k">new</span> <span class="nf">JavaDocument</span><span class="o">(</span><span class="mi">4L</span><span class="o">,</span> <span class="s">&quot;spark i j k&quot;</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaDocument</span><span class="o">(</span><span class="mi">5L</span><span class="o">,</span> <span class="s">&quot;l m n&quot;</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaDocument</span><span class="o">(</span><span class="mi">6L</span><span class="o">,</span> <span class="s">&quot;spark hadoop spark&quot;</span><span class="o">),</span>
-  <span class="k">new</span> <span class="nf">JavaDocument</span><span class="o">(</span><span class="mi">7L</span><span class="o">,</span> <span class="s">&quot;apache hadoop&quot;</span><span class="o">)</span>
+  <span class="k">new</span> <span class="n">JavaDocument</span><span class="o">(</span><span class="mi">4L</span><span class="o">,</span> <span class="s">&quot;spark i j k&quot;</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaDocument</span><span class="o">(</span><span class="mi">5L</span><span class="o">,</span> <span class="s">&quot;l m n&quot;</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaDocument</span><span class="o">(</span><span class="mi">6L</span><span class="o">,</span> <span class="s">&quot;spark hadoop spark&quot;</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">JavaDocument</span><span class="o">(</span><span class="mi">7L</span><span class="o">,</span> <span class="s">&quot;apache hadoop&quot;</span><span class="o">)</span>
 <span class="o">),</span> <span class="n">JavaDocument</span><span class="o">.</span><span class="na">class</span><span class="o">);</span>
 
 <span class="c1">// Make predictions on test documents.</span>
@@ -893,41 +893,41 @@ Refer to the [`Pipeline` Java docs](api/java/org/apache/spark/ml/Pipeline.html)
 
 Refer to the [`Pipeline` Python docs](api/python/pyspark.ml.html#pyspark.ml.Pipeline) for more details on the API.
 
-<div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.ml</span> <span class="kn">import</span> <span class="n">Pipeline</span>
+<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.ml</span> <span class="kn">import</span> <span class="n">Pipeline</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.classification</span> <span class="kn">import</span> <span class="n">LogisticRegression</span>
 <span class="kn">from</span> <span class="nn">pyspark.ml.feature</span> <span class="kn">import</span> <span class="n">HashingTF</span><span class="p">,</span> <span class="n">Tokenizer</span>
 
-<span class="c"># Prepare training documents from a list of (id, text, label) tuples.</span>
+<span class="c1"># Prepare training documents from a list of (id, text, label) tuples.</span>
 <span class="n">training</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([</span>
-    <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="s">&quot;a b c d e spark&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s">&quot;b d&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s">&quot;spark f g h&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s">&quot;hadoop mapreduce&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">)</span>
-<span class="p">],</span> <span class="p">[</span><span class="s">&quot;id&quot;</span><span class="p">,</span> <span class="s">&quot;text&quot;</span><span class="p">,</span> <span class="s">&quot;label&quot;</span><span class="p">])</span>
-
-<span class="c"># Configure an ML pipeline, which consists of three stages: tokenizer, hashingTF, and lr.</span>
-<span class="n">tokenizer</span> <span class="o">=</span> <span class="n">Tokenizer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s">&quot;text&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;words&quot;</span><span class="p">)</span>
-<span class="n">hashingTF</span> <span class="o">=</span> <span class="n">HashingTF</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">getOutputCol</span><span class="p">(),</span> <span class="n">outputCol</span><span class="o">=</span><span class="s">&quot;features&quot;</span><span class="p">)</span>
+    <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="s2">&quot;a b c d e spark&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s2">&quot;b d&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s2">&quot;spark f g h&quot;</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s2">&quot;hadoop mapreduce&quot;</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">)</span>
+<span class="p">],</span> <span class="p">[</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="s2">&quot;text&quot;</span><span class="p">,</span> <span class="s2">&quot;label&quot;</span><span class="p">])</span>
+
+<span class="c1"># Configure an ML pipeline, which consists of three stages: tokenizer, hashingTF, and lr.</span>
+<span class="n">tokenizer</span> <span class="o">=</span> <span class="n">Tokenizer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;text&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;words&quot;</span><span class="p">)</span>
+<span class="n">hashingTF</span> <span class="o">=</span> <span class="n">HashingTF</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">getOutputCol</span><span class="p">(),</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;features&quot;</span><span class="p">)</span>
 <span class="n">lr</span> <span class="o">=</span> <span class="n">LogisticRegression</span><span class="p">(</span><span class="n">maxIter</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">regParam</span><span class="o">=</span><span class="mf">0.001</span><span class="p">)</span>
 <span class="n">pipeline</span> <span class="o">=</span> <span class="n">Pipeline</span><span class="p">(</span><span class="n">stages</span><span class="o">=</span><span class="p">[</span><span class="n">tokenizer</span><span class="p">,</span> <span class="n">hashingTF</span><span class="p">,</span> <span class="n">lr</span><span class="p">])</span>
 
-<span class="c"># Fit the pipeline to training documents.</span>
+<span class="c1"># Fit the pipeline to training documents.</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">pipeline</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">training</span><span class="p">)</span>
 
-<span class="c"># Prepare test documents, which are unlabeled (id, text) tuples.</span>
+<span class="c1"># Prepare test documents, which are unlabeled (id, text) tuples.</span>
 <span class="n">test</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([</span>
-    <span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s">&quot;spark i j k&quot;</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="s">&quot;l m n&quot;</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="s">&quot;spark hadoop spark&quot;</span><span class="p">),</span>
-    <span class="p">(</span><span class="mi">7</span><span class="p">,</span> <span class="s">&quot;apache hadoop&quot;</span><span class="p">)</span>
-<span class="p">],</span> <span class="p">[</span><span class="s">&quot;id&quot;</span><span class="p">,</span> <span class="s">&quot;text&quot;</span><span class="p">])</span>
+    <span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s2">&quot;spark i j k&quot;</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="s2">&quot;l m n&quot;</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="s2">&quot;spark hadoop spark&quot;</span><span class="p">),</span>
+    <span class="p">(</span><span class="mi">7</span><span class="p">,</span> <span class="s2">&quot;apache hadoop&quot;</span><span class="p">)</span>
+<span class="p">],</span> <span class="p">[</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="s2">&quot;text&quot;</span><span class="p">])</span>
 
-<span class="c"># Make predictions on test documents and print columns of interest.</span>
+<span class="c1"># Make predictions on test documents and print columns of interest.</span>
 <span class="n">prediction</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">test</span><span class="p">)</span>
-<span class="n">selected</span> <span class="o">=</span> <span class="n">prediction</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&quot;id&quot;</span><span class="p">,</span> <span class="s">&quot;text&quot;</span><span class="p">,</span> <span class="s">&quot;probability&quot;</span><span class="p">,</span> <span class="s">&quot;prediction&quot;</span><span class="p">)</span>
+<span class="n">selected</span> <span class="o">=</span> <span class="n">prediction</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">&quot;id&quot;</span><span class="p">,</span> <span class="s2">&quot;text&quot;</span><span class="p">,</span> <span class="s2">&quot;probability&quot;</span><span class="p">,</span> <span class="s2">&quot;prediction&quot;</span><span class="p">)</span>
 <span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">selected</span><span class="o">.</span><span class="n">collect</span><span class="p">():</span>
     <span class="n">rid</span><span class="p">,</span> <span class="n">text</span><span class="p">,</span> <span class="n">prob</span><span class="p">,</span> <span class="n">prediction</span> <span class="o">=</span> <span class="n">row</span>
-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;(</span><span class="si">%d</span><span class="s">, </span><span class="si">%s</span><span class="s">) --&gt; prob=</span><span class="si">%s</span><span class="s">, prediction=</span><span class="si">%f</span><span class="s">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">rid</span><span class="p">,</span> <span class="n">text</span><span class="p">,</span> <span class="nb">str</span><span class="p">(</span><span class="n">prob</span><span class="p">),</span> <span class="n">prediction</span><span class="p">))</span>
+    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;(</span><span class="si">%d</span><span class="s2">, </span><span class="si">%s</span><span class="s2">) --&gt; prob=</span><span class="si">%s</span><span class="s2">, prediction=</span><span class="si">%f</span><span class="s2">&quot;</span> <span class="o">%</span> <span class="p">(</span><span class="n">rid</span><span class="p">,</span> <span class="n">text</span><span class="p">,</span> <span class="nb">str</span><span class="p">(</span><span class="n">prob</span><span class="p">),</span> <span class="n">prediction</span><span class="p">))</span>
 </pre></div><div><small>Find full example code at "examples/src/main/python/ml/pipeline_example.py" in the Spark repo.</small></div>
 </div>
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[04/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/streaming-programming-guide.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/streaming-programming-guide.html b/site/docs/2.1.0/streaming-programming-guide.html
index 9a87d23..b1ce1e1 100644
--- a/site/docs/2.1.0/streaming-programming-guide.html
+++ b/site/docs/2.1.0/streaming-programming-guide.html
@@ -129,32 +129,32 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#overview" id="markdown-toc-overview">Overview</a></li>
-  <li><a href="#a-quick-example" id="markdown-toc-a-quick-example">A Quick Example</a></li>
-  <li><a href="#basic-concepts" id="markdown-toc-basic-concepts">Basic Concepts</a>    <ul>
-      <li><a href="#linking" id="markdown-toc-linking">Linking</a></li>
-      <li><a href="#initializing-streamingcontext" id="markdown-toc-initializing-streamingcontext">Initializing StreamingContext</a></li>
-      <li><a href="#discretized-streams-dstreams" id="markdown-toc-discretized-streams-dstreams">Discretized Streams (DStreams)</a></li>
-      <li><a href="#input-dstreams-and-receivers" id="markdown-toc-input-dstreams-and-receivers">Input DStreams and Receivers</a></li>
-      <li><a href="#transformations-on-dstreams" id="markdown-toc-transformations-on-dstreams">Transformations on DStreams</a></li>
-      <li><a href="#output-operations-on-dstreams" id="markdown-toc-output-operations-on-dstreams">Output Operations on DStreams</a></li>
-      <li><a href="#dataframe-and-sql-operations" id="markdown-toc-dataframe-and-sql-operations">DataFrame and SQL Operations</a></li>
-      <li><a href="#mllib-operations" id="markdown-toc-mllib-operations">MLlib Operations</a></li>
-      <li><a href="#caching--persistence" id="markdown-toc-caching--persistence">Caching / Persistence</a></li>
-      <li><a href="#checkpointing" id="markdown-toc-checkpointing">Checkpointing</a></li>
-      <li><a href="#accumulators-broadcast-variables-and-checkpoints" id="markdown-toc-accumulators-broadcast-variables-and-checkpoints">Accumulators, Broadcast Variables, and Checkpoints</a></li>
-      <li><a href="#deploying-applications" id="markdown-toc-deploying-applications">Deploying Applications</a></li>
-      <li><a href="#monitoring-applications" id="markdown-toc-monitoring-applications">Monitoring Applications</a></li>
+  <li><a href="#overview">Overview</a></li>
+  <li><a href="#a-quick-example">A Quick Example</a></li>
+  <li><a href="#basic-concepts">Basic Concepts</a>    <ul>
+      <li><a href="#linking">Linking</a></li>
+      <li><a href="#initializing-streamingcontext">Initializing StreamingContext</a></li>
+      <li><a href="#discretized-streams-dstreams">Discretized Streams (DStreams)</a></li>
+      <li><a href="#input-dstreams-and-receivers">Input DStreams and Receivers</a></li>
+      <li><a href="#transformations-on-dstreams">Transformations on DStreams</a></li>
+      <li><a href="#output-operations-on-dstreams">Output Operations on DStreams</a></li>
+      <li><a href="#dataframe-and-sql-operations">DataFrame and SQL Operations</a></li>
+      <li><a href="#mllib-operations">MLlib Operations</a></li>
+      <li><a href="#caching--persistence">Caching / Persistence</a></li>
+      <li><a href="#checkpointing">Checkpointing</a></li>
+      <li><a href="#accumulators-broadcast-variables-and-checkpoints">Accumulators, Broadcast Variables, and Checkpoints</a></li>
+      <li><a href="#deploying-applications">Deploying Applications</a></li>
+      <li><a href="#monitoring-applications">Monitoring Applications</a></li>
     </ul>
   </li>
-  <li><a href="#performance-tuning" id="markdown-toc-performance-tuning">Performance Tuning</a>    <ul>
-      <li><a href="#reducing-the-batch-processing-times" id="markdown-toc-reducing-the-batch-processing-times">Reducing the Batch Processing Times</a></li>
-      <li><a href="#setting-the-right-batch-interval" id="markdown-toc-setting-the-right-batch-interval">Setting the Right Batch Interval</a></li>
-      <li><a href="#memory-tuning" id="markdown-toc-memory-tuning">Memory Tuning</a></li>
+  <li><a href="#performance-tuning">Performance Tuning</a>    <ul>
+      <li><a href="#reducing-the-batch-processing-times">Reducing the Batch Processing Times</a></li>
+      <li><a href="#setting-the-right-batch-interval">Setting the Right Batch Interval</a></li>
+      <li><a href="#memory-tuning">Memory Tuning</a></li>
     </ul>
   </li>
-  <li><a href="#fault-tolerance-semantics" id="markdown-toc-fault-tolerance-semantics">Fault-tolerance Semantics</a></li>
-  <li><a href="#where-to-go-from-here" id="markdown-toc-where-to-go-from-here">Where to Go from Here</a></li>
+  <li><a href="#fault-tolerance-semantics">Fault-tolerance Semantics</a></li>
+  <li><a href="#where-to-go-from-here">Where to Go from Here</a></li>
 </ul>
 
 <h1 id="overview">Overview</h1>
@@ -209,7 +209,7 @@ conversions from StreamingContext into our environment in order to add useful me
 other classes we need (like DStream). <a href="api/scala/index.html#org.apache.spark.streaming.StreamingContext">StreamingContext</a> is the
 main entry point for all streaming functionality. We create a local StreamingContext with two execution threads,  and a batch interval of 1 second.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark._</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark._</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.streaming._</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.streaming.StreamingContext._</span> <span class="c1">// not necessary since Spark 1.3</span>
 
@@ -217,33 +217,33 @@ main entry point for all streaming functionality. We create a local StreamingCon
 <span class="c1">// The master requires 2 cores to prevent from a starvation scenario.</span>
 
 <span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">().</span><span class="n">setMaster</span><span class="o">(</span><span class="s">&quot;local[2]&quot;</span><span class="o">).</span><span class="n">setAppName</span><span class="o">(</span><span class="s">&quot;NetworkWordCount&quot;</span><span class="o">)</span>
-<span class="k">val</span> <span class="n">ssc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">StreamingContext</span><span class="o">(</span><span class="n">conf</span><span class="o">,</span> <span class="nc">Seconds</span><span class="o">(</span><span class="mi">1</span><span class="o">))</span></code></pre></div>
+<span class="k">val</span> <span class="n">ssc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">StreamingContext</span><span class="o">(</span><span class="n">conf</span><span class="o">,</span> <span class="nc">Seconds</span><span class="o">(</span><span class="mi">1</span><span class="o">))</span></code></pre></figure>
 
     <p>Using this context, we can create a DStream that represents streaming data from a TCP
 source, specified as hostname (e.g. <code>localhost</code>) and port (e.g. <code>9999</code>).</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Create a DStream that will connect to hostname:port, like localhost:9999</span>
-<span class="k">val</span> <span class="n">lines</span> <span class="k">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">socketTextStream</span><span class="o">(</span><span class="s">&quot;localhost&quot;</span><span class="o">,</span> <span class="mi">9999</span><span class="o">)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// Create a DStream that will connect to hostname:port, like localhost:9999</span>
+<span class="k">val</span> <span class="n">lines</span> <span class="k">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">socketTextStream</span><span class="o">(</span><span class="s">&quot;localhost&quot;</span><span class="o">,</span> <span class="mi">9999</span><span class="o">)</span></code></pre></figure>
 
     <p>This <code>lines</code> DStream represents the stream of data that will be received from the data
 server. Each record in this DStream is a line of text. Next, we want to split the lines by
 space characters into words.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Split each line into words</span>
-<span class="k">val</span> <span class="n">words</span> <span class="k">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">flatMap</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">))</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// Split each line into words</span>
+<span class="k">val</span> <span class="n">words</span> <span class="k">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">flatMap</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">))</span></code></pre></figure>
 
     <p><code>flatMap</code> is a one-to-many DStream operation that creates a new DStream by
 generating multiple new records from each record in the source DStream. In this case,
 each line will be split into multiple words and the stream of words is represented as the
 <code>words</code> DStream.  Next, we want to count these words.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.streaming.StreamingContext._</span> <span class="c1">// not necessary since Spark 1.3</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.streaming.StreamingContext._</span> <span class="c1">// not necessary since Spark 1.3</span>
 <span class="c1">// Count each word in each batch</span>
 <span class="k">val</span> <span class="n">pairs</span> <span class="k">=</span> <span class="n">words</span><span class="o">.</span><span class="n">map</span><span class="o">(</span><span class="n">word</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">word</span><span class="o">,</span> <span class="mi">1</span><span class="o">))</span>
 <span class="k">val</span> <span class="n">wordCounts</span> <span class="k">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">reduceByKey</span><span class="o">(</span><span class="k">_</span> <span class="o">+</span> <span class="k">_</span><span class="o">)</span>
 
 <span class="c1">// Print the first ten elements of each RDD generated in this DStream to the console</span>
-<span class="n">wordCounts</span><span class="o">.</span><span class="n">print</span><span class="o">()</span></code></pre></div>
+<span class="n">wordCounts</span><span class="o">.</span><span class="n">print</span><span class="o">()</span></code></pre></figure>
 
     <p>The <code>words</code> DStream is further mapped (one-to-one transformation) to a DStream of <code>(word,
 1)</code> pairs, which is then reduced to get the frequency of words in each batch of data.
@@ -253,8 +253,8 @@ Finally, <code>wordCounts.print()</code> will print a few of the counts generate
 will perform when it is started, and no real processing has started yet. To start the processing
 after all the transformations have been setup, we finally call</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">ssc</span><span class="o">.</span><span class="n">start</span><span class="o">()</span>             <span class="c1">// Start the computation</span>
-<span class="n">ssc</span><span class="o">.</span><span class="n">awaitTermination</span><span class="o">()</span>  <span class="c1">// Wait for the computation to terminate</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">ssc</span><span class="o">.</span><span class="n">start</span><span class="o">()</span>             <span class="c1">// Start the computation</span>
+<span class="n">ssc</span><span class="o">.</span><span class="n">awaitTermination</span><span class="o">()</span>  <span class="c1">// Wait for the computation to terminate</span></code></pre></figure>
 
     <p>The complete code can be found in the Spark Streaming example
 <a href="https://github.com/apache/spark/blob/v2.1.0/examples/src/main/scala/org/apache/spark/examples/streaming/NetworkWordCount.scala">NetworkWordCount</a>.
@@ -268,33 +268,33 @@ after all the transformations have been setup, we finally call</p>
 which is the main entry point for all streaming
 functionality. We create a local StreamingContext with two execution threads, and a batch interval of 1 second.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.*</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.*</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.*</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.streaming.*</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.streaming.api.java.*</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
 
 <span class="c1">// Create a local StreamingContext with two working thread and batch interval of 1 second</span>
-<span class="n">SparkConf</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">SparkConf</span><span class="o">().</span><span class="na">setMaster</span><span class="o">(</span><span class="s">&quot;local[2]&quot;</span><span class="o">).</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;NetworkWordCount&quot;</span><span class="o">);</span>
-<span class="n">JavaStreamingContext</span> <span class="n">jssc</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JavaStreamingContext</span><span class="o">(</span><span class="n">conf</span><span class="o">,</span> <span class="n">Durations</span><span class="o">.</span><span class="na">seconds</span><span class="o">(</span><span class="mi">1</span><span class="o">));</span></code></pre></div>
+<span class="n">SparkConf</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SparkConf</span><span class="o">().</span><span class="na">setMaster</span><span class="o">(</span><span class="s">&quot;local[2]&quot;</span><span class="o">).</span><span class="na">setAppName</span><span class="o">(</span><span class="s">&quot;NetworkWordCount&quot;</span><span class="o">);</span>
+<span class="n">JavaStreamingContext</span> <span class="n">jssc</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaStreamingContext</span><span class="o">(</span><span class="n">conf</span><span class="o">,</span> <span class="n">Durations</span><span class="o">.</span><span class="na">seconds</span><span class="o">(</span><span class="mi">1</span><span class="o">));</span></code></pre></figure>
 
     <p>Using this context, we can create a DStream that represents streaming data from a TCP
 source, specified as hostname (e.g. <code>localhost</code>) and port (e.g. <code>9999</code>).</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Create a DStream that will connect to hostname:port, like localhost:9999</span>
-<span class="n">JavaReceiverInputDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">lines</span> <span class="o">=</span> <span class="n">jssc</span><span class="o">.</span><span class="na">socketTextStream</span><span class="o">(</span><span class="s">&quot;localhost&quot;</span><span class="o">,</span> <span class="mi">9999</span><span class="o">);</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// Create a DStream that will connect to hostname:port, like localhost:9999</span>
+<span class="n">JavaReceiverInputDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">lines</span> <span class="o">=</span> <span class="n">jssc</span><span class="o">.</span><span class="na">socketTextStream</span><span class="o">(</span><span class="s">&quot;localhost&quot;</span><span class="o">,</span> <span class="mi">9999</span><span class="o">);</span></code></pre></figure>
 
     <p>This <code>lines</code> DStream represents the stream of data that will be received from the data
 server. Each record in this stream is a line of text. Then, we want to split the lines by
 space into words.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Split each line into words</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// Split each line into words</span>
 <span class="n">JavaDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="n">words</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="na">flatMap</span><span class="o">(</span>
   <span class="k">new</span> <span class="n">FlatMapFunction</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;()</span> <span class="o">{</span>
     <span class="nd">@Override</span> <span class="kd">public</span> <span class="n">Iterator</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span> <span class="nf">call</span><span class="o">(</span><span class="n">String</span> <span class="n">x</span><span class="o">)</span> <span class="o">{</span>
       <span class="k">return</span> <span class="n">Arrays</span><span class="o">.</span><span class="na">asList</span><span class="o">(</span><span class="n">x</span><span class="o">.</span><span class="na">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">)).</span><span class="na">iterator</span><span class="o">();</span>
     <span class="o">}</span>
-  <span class="o">});</span></code></pre></div>
+  <span class="o">});</span></code></pre></figure>
 
     <p><code>flatMap</code> is a DStream operation that creates a new DStream by
 generating multiple new records from each record in the source DStream. In this case,
@@ -306,7 +306,7 @@ that help define DStream transformations.</p>
 
     <p>Next, we want to count these words.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Count each word in each batch</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// Count each word in each batch</span>
 <span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">pairs</span> <span class="o">=</span> <span class="n">words</span><span class="o">.</span><span class="na">mapToPair</span><span class="o">(</span>
   <span class="k">new</span> <span class="n">PairFunction</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;()</span> <span class="o">{</span>
     <span class="nd">@Override</span> <span class="kd">public</span> <span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="nf">call</span><span class="o">(</span><span class="n">String</span> <span class="n">s</span><span class="o">)</span> <span class="o">{</span>
@@ -321,7 +321,7 @@ that help define DStream transformations.</p>
   <span class="o">});</span>
 
 <span class="c1">// Print the first ten elements of each RDD generated in this DStream to the console</span>
-<span class="n">wordCounts</span><span class="o">.</span><span class="na">print</span><span class="o">();</span></code></pre></div>
+<span class="n">wordCounts</span><span class="o">.</span><span class="na">print</span><span class="o">();</span></code></pre></figure>
 
     <p>The <code>words</code> DStream is further mapped (one-to-one transformation) to a DStream of <code>(word,
 1)</code> pairs, using a <a href="api/scala/index.html#org.apache.spark.api.java.function.PairFunction">PairFunction</a>
@@ -333,8 +333,8 @@ Finally, <code>wordCounts.print()</code> will print a few of the counts generate
 will perform after it is started, and no real processing has started yet. To start the processing
 after all the transformations have been setup, we finally call <code>start</code> method.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">jssc</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>              <span class="c1">// Start the computation</span>
-<span class="n">jssc</span><span class="o">.</span><span class="na">awaitTermination</span><span class="o">();</span>   <span class="c1">// Wait for the computation to terminate</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">jssc</span><span class="o">.</span><span class="na">start</span><span class="o">();</span>              <span class="c1">// Start the computation</span>
+<span class="n">jssc</span><span class="o">.</span><span class="na">awaitTermination</span><span class="o">();</span>   <span class="c1">// Wait for the computation to terminate</span></code></pre></figure>
 
     <p>The complete code can be found in the Spark Streaming example
 <a href="https://github.com/apache/spark/blob/v2.1.0/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java">JavaNetworkWordCount</a>.
@@ -344,37 +344,37 @@ after all the transformations have been setup, we finally call <code>start</code
 <div data-lang="python">
     <p>First, we import <a href="api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext">StreamingContext</a>, which is the main entry point for all streaming functionality. We create a local StreamingContext with two execution threads, and batch interval of 1 second.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pyspark</span> <span class="kn">import</span> <span class="n">SparkContext</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="kn">from</span> <span class="nn">pyspark</span> <span class="kn">import</span> <span class="n">SparkContext</span>
 <span class="kn">from</span> <span class="nn">pyspark.streaming</span> <span class="kn">import</span> <span class="n">StreamingContext</span>
 
-<span class="c"># Create a local StreamingContext with two working thread and batch interval of 1 second</span>
-<span class="n">sc</span> <span class="o">=</span> <span class="n">SparkContext</span><span class="p">(</span><span class="s">&quot;local[2]&quot;</span><span class="p">,</span> <span class="s">&quot;NetworkWordCount&quot;</span><span class="p">)</span>
-<span class="n">ssc</span> <span class="o">=</span> <span class="n">StreamingContext</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span></code></pre></div>
+<span class="c1"># Create a local StreamingContext with two working thread and batch interval of 1 second</span>
+<span class="n">sc</span> <span class="o">=</span> <span class="n">SparkContext</span><span class="p">(</span><span class="s2">&quot;local[2]&quot;</span><span class="p">,</span> <span class="s2">&quot;NetworkWordCount&quot;</span><span class="p">)</span>
+<span class="n">ssc</span> <span class="o">=</span> <span class="n">StreamingContext</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span></code></pre></figure>
 
     <p>Using this context, we can create a DStream that represents streaming data from a TCP
 source, specified as hostname (e.g. <code>localhost</code>) and port (e.g. <code>9999</code>).</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># Create a DStream that will connect to hostname:port, like localhost:9999</span>
-<span class="n">lines</span> <span class="o">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">socketTextStream</span><span class="p">(</span><span class="s">&quot;localhost&quot;</span><span class="p">,</span> <span class="mi">9999</span><span class="p">)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="c1"># Create a DStream that will connect to hostname:port, like localhost:9999</span>
+<span class="n">lines</span> <span class="o">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">socketTextStream</span><span class="p">(</span><span class="s2">&quot;localhost&quot;</span><span class="p">,</span> <span class="mi">9999</span><span class="p">)</span></code></pre></figure>
 
     <p>This <code>lines</code> DStream represents the stream of data that will be received from the data
 server. Each record in this DStream is a line of text. Next, we want to split the lines by
 space into words.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># Split each line into words</span>
-<span class="n">words</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">flatMap</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">))</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="c1"># Split each line into words</span>
+<span class="n">words</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">flatMap</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">))</span></code></pre></figure>
 
     <p><code>flatMap</code> is a one-to-many DStream operation that creates a new DStream by
 generating multiple new records from each record in the source DStream. In this case,
 each line will be split into multiple words and the stream of words is represented as the
 <code>words</code> DStream.  Next, we want to count these words.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># Count each word in each batch</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="c1"># Count each word in each batch</span>
 <span class="n">pairs</span> <span class="o">=</span> <span class="n">words</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">word</span><span class="p">:</span> <span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
 <span class="n">wordCounts</span> <span class="o">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">reduceByKey</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">)</span>
 
-<span class="c"># Print the first ten elements of each RDD generated in this DStream to the console</span>
-<span class="n">wordCounts</span><span class="o">.</span><span class="n">pprint</span><span class="p">()</span></code></pre></div>
+<span class="c1"># Print the first ten elements of each RDD generated in this DStream to the console</span>
+<span class="n">wordCounts</span><span class="o">.</span><span class="n">pprint</span><span class="p">()</span></code></pre></figure>
 
     <p>The <code>words</code> DStream is further mapped (one-to-one transformation) to a DStream of <code>(word,
 1)</code> pairs, which is then reduced to get the frequency of words in each batch of data.
@@ -384,8 +384,8 @@ Finally, <code>wordCounts.pprint()</code> will print a few of the counts generat
 will perform when it is started, and no real processing has started yet. To start the processing
 after all the transformations have been setup, we finally call</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">ssc</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>             <span class="c"># Start the computation</span>
-<span class="n">ssc</span><span class="o">.</span><span class="n">awaitTermination</span><span class="p">()</span>  <span class="c"># Wait for the computation to terminate</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">ssc</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>             <span class="c1"># Start the computation</span>
+<span class="n">ssc</span><span class="o">.</span><span class="n">awaitTermination</span><span class="p">()</span>  <span class="c1"># Wait for the computation to terminate</span></code></pre></figure>
 
     <p>The complete code can be found in the Spark Streaming example
 <a href="https://github.com/apache/spark/blob/v2.1.0/examples/src/main/python/streaming/network_wordcount.py">NetworkWordCount</a>.
@@ -398,24 +398,24 @@ after all the transformations have been setup, we finally call</p>
 you can run this example as follows. You will first need to run Netcat
 (a small utility found in most Unix-like systems) as a data server by using</p>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>nc -lk 9999</code></pre></div>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ nc -lk <span class="m">9999</span></code></pre></figure>
 
 <p>Then, in a different terminal, you can start the example by using</p>
 
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./bin/run-example streaming.NetworkWordCount localhost 9999</code></pre></div>
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ ./bin/run-example streaming.NetworkWordCount localhost <span class="m">9999</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./bin/run-example streaming.JavaNetworkWordCount localhost 9999</code></pre></div>
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ ./bin/run-example streaming.JavaNetworkWordCount localhost <span class="m">9999</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./bin/spark-submit examples/src/main/python/streaming/network_wordcount.py localhost 9999</code></pre></div>
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ ./bin/spark-submit examples/src/main/python/streaming/network_wordcount.py localhost <span class="m">9999</span></code></pre></figure>
 
   </div>
 </div>
@@ -426,16 +426,16 @@ screen every second. It will look something like the following.</p>
 <table width="100%">
     <td>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># TERMINAL 1:</span>
-<span class="c"># Running Netcat</span>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="c1"># TERMINAL 1:</span>
+<span class="c1"># Running Netcat</span>
 
-<span class="nv">$ </span>nc -lk 9999
+$ nc -lk <span class="m">9999</span>
 
 hello world
 
 
 
-...</code></pre></div>
+...</code></pre></figure>
 
     </td>
     <td width="2%"></td>
@@ -444,45 +444,45 @@ hello world
 
 <div data-lang="scala">
 
-        <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># TERMINAL 2: RUNNING NetworkWordCount</span>
+        <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="c1"># TERMINAL 2: RUNNING NetworkWordCount</span>
 
-<span class="nv">$ </span>./bin/run-example streaming.NetworkWordCount localhost 9999
+$ ./bin/run-example streaming.NetworkWordCount localhost <span class="m">9999</span>
 ...
 -------------------------------------------
 Time: <span class="m">1357008430000</span> ms
 -------------------------------------------
 <span class="o">(</span>hello,1<span class="o">)</span>
 <span class="o">(</span>world,1<span class="o">)</span>
-...</code></pre></div>
+...</code></pre></figure>
 
       </div>
 
 <div data-lang="java">
 
-        <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># TERMINAL 2: RUNNING JavaNetworkWordCount</span>
+        <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="c1"># TERMINAL 2: RUNNING JavaNetworkWordCount</span>
 
-<span class="nv">$ </span>./bin/run-example streaming.JavaNetworkWordCount localhost 9999
+$ ./bin/run-example streaming.JavaNetworkWordCount localhost <span class="m">9999</span>
 ...
 -------------------------------------------
 Time: <span class="m">1357008430000</span> ms
 -------------------------------------------
 <span class="o">(</span>hello,1<span class="o">)</span>
 <span class="o">(</span>world,1<span class="o">)</span>
-...</code></pre></div>
+...</code></pre></figure>
 
       </div>
 <div data-lang="python">
 
-        <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># TERMINAL 2: RUNNING network_wordcount.py</span>
+        <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="c1"># TERMINAL 2: RUNNING network_wordcount.py</span>
 
-<span class="nv">$ </span>./bin/spark-submit examples/src/main/python/streaming/network_wordcount.py localhost 9999
+$ ./bin/spark-submit examples/src/main/python/streaming/network_wordcount.py localhost <span class="m">9999</span>
 ...
 -------------------------------------------
-Time: 2014-10-14 15:25:21
+Time: <span class="m">2014</span>-10-14 <span class="m">15</span>:25:21
 -------------------------------------------
 <span class="o">(</span>hello,1<span class="o">)</span>
 <span class="o">(</span>world,1<span class="o">)</span>
-...</code></pre></div>
+...</code></pre></figure>
 
       </div>
 </div>
@@ -546,11 +546,11 @@ for the full list of supported sources and artifacts.</p>
 
     <p>A <a href="api/scala/index.html#org.apache.spark.streaming.StreamingContext">StreamingContext</a> object can be created from a <a href="api/scala/index.html#org.apache.spark.SparkConf">SparkConf</a> object.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark._</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark._</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.streaming._</span>
 
 <span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">().</span><span class="n">setAppName</span><span class="o">(</span><span class="n">appName</span><span class="o">).</span><span class="n">setMaster</span><span class="o">(</span><span class="n">master</span><span class="o">)</span>
-<span class="k">val</span> <span class="n">ssc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">StreamingContext</span><span class="o">(</span><span class="n">conf</span><span class="o">,</span> <span class="nc">Seconds</span><span class="o">(</span><span class="mi">1</span><span class="o">))</span></code></pre></div>
+<span class="k">val</span> <span class="n">ssc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">StreamingContext</span><span class="o">(</span><span class="n">conf</span><span class="o">,</span> <span class="nc">Seconds</span><span class="o">(</span><span class="mi">1</span><span class="o">))</span></code></pre></figure>
 
     <p>The <code>appName</code> parameter is a name for your application to show on the cluster UI.
 <code>master</code> is a <a href="submitting-applications.html#master-urls">Spark, Mesos or YARN cluster URL</a>,
@@ -566,21 +566,21 @@ section for more details.</p>
 
     <p>A <code>StreamingContext</code> object can also be created from an existing <code>SparkContext</code> object.</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.streaming._</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.streaming._</span>
 
 <span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="o">...</span>                <span class="c1">// existing SparkContext</span>
-<span class="k">val</span> <span class="n">ssc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">StreamingContext</span><span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="nc">Seconds</span><span class="o">(</span><span class="mi">1</span><span class="o">))</span></code></pre></div>
+<span class="k">val</span> <span class="n">ssc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">StreamingContext</span><span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="nc">Seconds</span><span class="o">(</span><span class="mi">1</span><span class="o">))</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
     <p>A <a href="api/java/index.html?org/apache/spark/streaming/api/java/JavaStreamingContext.html">JavaStreamingContext</a> object can be created from a <a href="api/java/index.html?org/apache/spark/SparkConf.html">SparkConf</a> object.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.*</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.*</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.streaming.api.java.*</span><span class="o">;</span>
 
-<span class="n">SparkConf</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="n">appName</span><span class="o">).</span><span class="na">setMaster</span><span class="o">(</span><span class="n">master</span><span class="o">);</span>
-<span class="n">JavaStreamingContext</span> <span class="n">ssc</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JavaStreamingContext</span><span class="o">(</span><span class="n">conf</span><span class="o">,</span> <span class="k">new</span> <span class="nf">Duration</span><span class="o">(</span><span class="mi">1000</span><span class="o">));</span></code></pre></div>
+<span class="n">SparkConf</span> <span class="n">conf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SparkConf</span><span class="o">().</span><span class="na">setAppName</span><span class="o">(</span><span class="n">appName</span><span class="o">).</span><span class="na">setMaster</span><span class="o">(</span><span class="n">master</span><span class="o">);</span>
+<span class="n">JavaStreamingContext</span> <span class="n">ssc</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaStreamingContext</span><span class="o">(</span><span class="n">conf</span><span class="o">,</span> <span class="k">new</span> <span class="n">Duration</span><span class="o">(</span><span class="mi">1000</span><span class="o">));</span></code></pre></figure>
 
     <p>The <code>appName</code> parameter is a name for your application to show on the cluster UI.
 <code>master</code> is a <a href="submitting-applications.html#master-urls">Spark, Mesos or YARN cluster URL</a>,
@@ -596,21 +596,21 @@ section for more details.</p>
 
     <p>A <code>JavaStreamingContext</code> object can also be created from an existing <code>JavaSparkContext</code>.</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.streaming.api.java.*</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.streaming.api.java.*</span><span class="o">;</span>
 
 <span class="n">JavaSparkContext</span> <span class="n">sc</span> <span class="o">=</span> <span class="o">...</span>   <span class="c1">//existing JavaSparkContext</span>
-<span class="n">JavaStreamingContext</span> <span class="n">ssc</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">JavaStreamingContext</span><span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="n">Durations</span><span class="o">.</span><span class="na">seconds</span><span class="o">(</span><span class="mi">1</span><span class="o">));</span></code></pre></div>
+<span class="n">JavaStreamingContext</span> <span class="n">ssc</span> <span class="o">=</span> <span class="k">new</span> <span class="n">JavaStreamingContext</span><span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="n">Durations</span><span class="o">.</span><span class="na">seconds</span><span class="o">(</span><span class="mi">1</span><span class="o">));</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
     <p>A <a href="api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext">StreamingContext</a> object can be created from a <a href="api/python/pyspark.html#pyspark.SparkContext">SparkContext</a> object.</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pyspark</span> <span class="kn">import</span> <span class="n">SparkContext</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="kn">from</span> <span class="nn">pyspark</span> <span class="kn">import</span> <span class="n">SparkContext</span>
 <span class="kn">from</span> <span class="nn">pyspark.streaming</span> <span class="kn">import</span> <span class="n">StreamingContext</span>
 
 <span class="n">sc</span> <span class="o">=</span> <span class="n">SparkContext</span><span class="p">(</span><span class="n">master</span><span class="p">,</span> <span class="n">appName</span><span class="p">)</span>
-<span class="n">ssc</span> <span class="o">=</span> <span class="n">StreamingContext</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span></code></pre></div>
+<span class="n">ssc</span> <span class="o">=</span> <span class="n">StreamingContext</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span></code></pre></figure>
 
     <p>The <code>appName</code> parameter is a name for your application to show on the cluster UI.
 <code>master</code> is a <a href="submitting-applications.html#master-urls">Spark, Mesos or YARN cluster URL</a>,
@@ -931,15 +931,15 @@ define the update function as:</p>
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">def</span> <span class="n">updateFunction</span><span class="o">(</span><span class="n">newValues</span><span class="k">:</span> <span class="kt">Seq</span><span class="o">[</span><span class="kt">Int</span><span class="o">],</span> <span class="n">runningCount</span><span class="k">:</span> <span class="kt">Option</span><span class="o">[</span><span class="kt">Int</span><span class="o">])</span><span class="k">:</span> <span class="kt">Option</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">def</span> <span class="n">updateFunction</span><span class="o">(</span><span class="n">newValues</span><span class="k">:</span> <span class="kt">Seq</span><span class="o">[</span><span class="kt">Int</span><span class="o">],</span> <span class="n">runningCount</span><span class="k">:</span> <span class="kt">Option</span><span class="o">[</span><span class="kt">Int</span><span class="o">])</span><span class="k">:</span> <span class="kt">Option</span><span class="o">[</span><span class="kt">Int</span><span class="o">]</span> <span class="k">=</span> <span class="o">{</span>
     <span class="k">val</span> <span class="n">newCount</span> <span class="k">=</span> <span class="o">...</span>  <span class="c1">// add the new values with the previous running count to get the new count</span>
     <span class="nc">Some</span><span class="o">(</span><span class="n">newCount</span><span class="o">)</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
     <p>This is applied on a DStream containing words (say, the <code>pairs</code> DStream containing <code>(word,
 1)</code> pairs in the <a href="#a-quick-example">earlier example</a>).</p>
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">runningCounts</span> <span class="k">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">updateStateByKey</span><span class="o">[</span><span class="kt">Int</span><span class="o">](</span><span class="n">updateFunction</span> <span class="k">_</span><span class="o">)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">runningCounts</span> <span class="k">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">updateStateByKey</span><span class="o">[</span><span class="kt">Int</span><span class="o">](</span><span class="n">updateFunction</span> <span class="k">_</span><span class="o">)</span></code></pre></figure>
 
     <p>The update function will be called for each word, with <code>newValues</code> having a sequence of 1&#8217;s (from
 the <code>(word, 1)</code> pairs) and the <code>runningCount</code> having the previous count.</p>
@@ -947,18 +947,18 @@ the <code>(word, 1)</code> pairs) and the <code>runningCount</code> having the p
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">Function2</span><span class="o">&lt;</span><span class="n">List</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;,</span> <span class="n">Optional</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;,</span> <span class="n">Optional</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;&gt;</span> <span class="n">updateFunction</span> <span class="o">=</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">Function2</span><span class="o">&lt;</span><span class="n">List</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;,</span> <span class="n">Optional</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;,</span> <span class="n">Optional</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;&gt;</span> <span class="n">updateFunction</span> <span class="o">=</span>
   <span class="k">new</span> <span class="n">Function2</span><span class="o">&lt;</span><span class="n">List</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;,</span> <span class="n">Optional</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;,</span> <span class="n">Optional</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;&gt;()</span> <span class="o">{</span>
     <span class="nd">@Override</span> <span class="kd">public</span> <span class="n">Optional</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="nf">call</span><span class="o">(</span><span class="n">List</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">values</span><span class="o">,</span> <span class="n">Optional</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">&gt;</span> <span class="n">state</span><span class="o">)</span> <span class="o">{</span>
       <span class="n">Integer</span> <span class="n">newSum</span> <span class="o">=</span> <span class="o">...</span>  <span class="c1">// add the new values with the previous running count to get the new count</span>
       <span class="k">return</span> <span class="n">Optional</span><span class="o">.</span><span class="na">of</span><span class="o">(</span><span class="n">newSum</span><span class="o">);</span>
     <span class="o">}</span>
-  <span class="o">};</span></code></pre></div>
+  <span class="o">};</span></code></pre></figure>
 
     <p>This is applied on a DStream containing words (say, the <code>pairs</code> DStream containing <code>(word,
 1)</code> pairs in the <a href="#a-quick-example">quick example</a>).</p>
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">runningCounts</span> <span class="o">=</span> <span class="n">pairs</span><span class="o">.</span><span class="na">updateStateByKey</span><span class="o">(</span><span class="n">updateFunction</span><span class="o">);</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">runningCounts</span> <span class="o">=</span> <span class="n">pairs</span><span class="o">.</span><span class="na">updateStateByKey</span><span class="o">(</span><span class="n">updateFunction</span><span class="o">);</span></code></pre></figure>
 
     <p>The update function will be called for each word, with <code>newValues</code> having a sequence of 1&#8217;s (from
 the <code>(word, 1)</code> pairs) and the <code>runningCount</code> having the previous count. For the complete
@@ -969,15 +969,15 @@ Java code, take a look at the example
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">updateFunction</span><span class="p">(</span><span class="n">newValues</span><span class="p">,</span> <span class="n">runningCount</span><span class="p">):</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="k">def</span> <span class="nf">updateFunction</span><span class="p">(</span><span class="n">newValues</span><span class="p">,</span> <span class="n">runningCount</span><span class="p">):</span>
     <span class="k">if</span> <span class="n">runningCount</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
         <span class="n">runningCount</span> <span class="o">=</span> <span class="mi">0</span>
-    <span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">newValues</span><span class="p">,</span> <span class="n">runningCount</span><span class="p">)</span>  <span class="c"># add the new values with the previous running count to get the new count</span></code></pre></div>
+    <span class="k">return</span> <span class="nb">sum</span><span class="p">(</span><span class="n">newValues</span><span class="p">,</span> <span class="n">runningCount</span><span class="p">)</span>  <span class="c1"># add the new values with the previous running count to get the new count</span></code></pre></figure>
 
     <p>This is applied on a DStream containing words (say, the <code>pairs</code> DStream containing <code>(word,
 1)</code> pairs in the <a href="#a-quick-example">earlier example</a>).</p>
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">runningCounts</span> <span class="o">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">updateStateByKey</span><span class="p">(</span><span class="n">updateFunction</span><span class="p">)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">runningCounts</span> <span class="o">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">updateStateByKey</span><span class="p">(</span><span class="n">updateFunction</span><span class="p">)</span></code></pre></figure>
 
     <p>The update function will be called for each word, with <code>newValues</code> having a sequence of 1&#8217;s (from
 the <code>(word, 1)</code> pairs) and the <code>runningCount</code> having the previous count. For the complete
@@ -1003,17 +1003,17 @@ spam information (maybe generated with Spark as well) and then filtering based o
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">spamInfoRDD</span> <span class="k">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">sparkContext</span><span class="o">.</span><span class="n">newAPIHadoopRDD</span><span class="o">(...)</span> <span class="c1">// RDD containing spam information</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">spamInfoRDD</span> <span class="k">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">sparkContext</span><span class="o">.</span><span class="n">newAPIHadoopRDD</span><span class="o">(...)</span> <span class="c1">// RDD containing spam information</span>
 
 <span class="k">val</span> <span class="n">cleanedDStream</span> <span class="k">=</span> <span class="n">wordCounts</span><span class="o">.</span><span class="n">transform</span> <span class="o">{</span> <span class="n">rdd</span> <span class="k">=&gt;</span>
   <span class="n">rdd</span><span class="o">.</span><span class="n">join</span><span class="o">(</span><span class="n">spamInfoRDD</span><span class="o">).</span><span class="n">filter</span><span class="o">(...)</span> <span class="c1">// join data stream with spam information to do data cleaning</span>
   <span class="o">...</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.streaming.api.java.*</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.streaming.api.java.*</span><span class="o">;</span>
 <span class="c1">// RDD containing spam information</span>
 <span class="kd">final</span> <span class="n">JavaPairRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Double</span><span class="o">&gt;</span> <span class="n">spamInfoRDD</span> <span class="o">=</span> <span class="n">jssc</span><span class="o">.</span><span class="na">sparkContext</span><span class="o">().</span><span class="na">newAPIHadoopRDD</span><span class="o">(...);</span>
 
@@ -1023,15 +1023,15 @@ spam information (maybe generated with Spark as well) and then filtering based o
       <span class="n">rdd</span><span class="o">.</span><span class="na">join</span><span class="o">(</span><span class="n">spamInfoRDD</span><span class="o">).</span><span class="na">filter</span><span class="o">(...);</span> <span class="c1">// join data stream with spam information to do data cleaning</span>
       <span class="o">...</span>
     <span class="o">}</span>
-  <span class="o">});</span></code></pre></div>
+  <span class="o">});</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">spamInfoRDD</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">pickleFile</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>  <span class="c"># RDD containing spam information</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">spamInfoRDD</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">pickleFile</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>  <span class="c1"># RDD containing spam information</span>
 
-<span class="c"># join data stream with spam information to do data cleaning</span>
-<span class="n">cleanedDStream</span> <span class="o">=</span> <span class="n">wordCounts</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="k">lambda</span> <span class="n">rdd</span><span class="p">:</span> <span class="n">rdd</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">spamInfoRDD</span><span class="p">)</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="o">...</span><span class="p">))</span></code></pre></div>
+<span class="c1"># join data stream with spam information to do data cleaning</span>
+<span class="n">cleanedDStream</span> <span class="o">=</span> <span class="n">wordCounts</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="k">lambda</span> <span class="n">rdd</span><span class="p">:</span> <span class="n">rdd</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">spamInfoRDD</span><span class="p">)</span><span class="o">.</span><span class="n">filter</span><span class="p">(</span><span class="o">...</span><span class="p">))</span></code></pre></figure>
 
   </div>
 </div>
@@ -1073,13 +1073,13 @@ operation <code>reduceByKeyAndWindow</code>.</p>
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Reduce last 30 seconds of data, every 10 seconds</span>
-<span class="k">val</span> <span class="n">windowedWordCounts</span> <span class="k">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">reduceByKeyAndWindow</span><span class="o">((</span><span class="n">a</span><span class="k">:</span><span class="kt">Int</span><span class="o">,</span><span class="n">b</span><span class="k">:</span><span class="kt">Int</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">),</span> <span class="nc">Seconds</span><span class="o">(</span><span class="mi">30</span><span class="o">),</span> <span class="nc">Seconds</span><span class="o">(</span><span class="mi">10</span><span class="o">))</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// Reduce last 30 seconds of data, every 10 seconds</span>
+<span class="k">val</span> <span class="n">windowedWordCounts</span> <span class="k">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">reduceByKeyAndWindow</span><span class="o">((</span><span class="n">a</span><span class="k">:</span><span class="kt">Int</span><span class="o">,</span><span class="n">b</span><span class="k">:</span><span class="kt">Int</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="o">(</span><span class="n">a</span> <span class="o">+</span> <span class="n">b</span><span class="o">),</span> <span class="nc">Seconds</span><span class="o">(</span><span class="mi">30</span><span class="o">),</span> <span class="nc">Seconds</span><span class="o">(</span><span class="mi">10</span><span class="o">))</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Reduce function adding two integers, defined separately for clarity</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// Reduce function adding two integers, defined separately for clarity</span>
 <span class="n">Function2</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">,</span> <span class="n">Integer</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">reduceFunc</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Function2</span><span class="o">&lt;</span><span class="n">Integer</span><span class="o">,</span> <span class="n">Integer</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;()</span> <span class="o">{</span>
   <span class="nd">@Override</span> <span class="kd">public</span> <span class="n">Integer</span> <span class="nf">call</span><span class="o">(</span><span class="n">Integer</span> <span class="n">i1</span><span class="o">,</span> <span class="n">Integer</span> <span class="n">i2</span><span class="o">)</span> <span class="o">{</span>
     <span class="k">return</span> <span class="n">i1</span> <span class="o">+</span> <span class="n">i2</span><span class="o">;</span>
@@ -1087,13 +1087,13 @@ operation <code>reduceByKeyAndWindow</code>.</p>
 <span class="o">};</span>
 
 <span class="c1">// Reduce last 30 seconds of data, every 10 seconds</span>
-<span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">windowedWordCounts</span> <span class="o">=</span> <span class="n">pairs</span><span class="o">.</span><span class="na">reduceByKeyAndWindow</span><span class="o">(</span><span class="n">reduceFunc</span><span class="o">,</span> <span class="n">Durations</span><span class="o">.</span><span class="na">seconds</span><span class="o">(</span><span class="mi">30</span><span class="o">),</span> <span class="n">Durations</span><span class="o">.</span><span class="na">seconds</span><span class="o">(</span><span class="mi">10</span><span class="o">));</span></code></pre></div>
+<span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Integer</span><span class="o">&gt;</span> <span class="n">windowedWordCounts</span> <span class="o">=</span> <span class="n">pairs</span><span class="o">.</span><span class="na">reduceByKeyAndWindow</span><span class="o">(</span><span class="n">reduceFunc</span><span class="o">,</span> <span class="n">Durations</span><span class="o">.</span><span class="na">seconds</span><span class="o">(</span><span class="mi">30</span><span class="o">),</span> <span class="n">Durations</span><span class="o">.</span><span class="na">seconds</span><span class="o">(</span><span class="mi">10</span><span class="o">));</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># Reduce last 30 seconds of data, every 10 seconds</span>
-<span class="n">windowedWordCounts</span> <span class="o">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">reduceByKeyAndWindow</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="n">x</span> <span class="o">-</span> <span class="n">y</span><span class="p">,</span> <span class="mi">30</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span></code></pre></div>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="c1"># Reduce last 30 seconds of data, every 10 seconds</span>
+<span class="n">windowedWordCounts</span> <span class="o">=</span> <span class="n">pairs</span><span class="o">.</span><span class="n">reduceByKeyAndWindow</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">,</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="n">x</span> <span class="o">-</span> <span class="n">y</span><span class="p">,</span> <span class="mi">30</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span></code></pre></figure>
 
   </div>
 </div>
@@ -1167,48 +1167,48 @@ said two parameters - <i>windowLength</i> and <i>slideInterval</i>.</p>
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">stream1</span><span class="k">:</span> <span class="kt">DStream</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">stream1</span><span class="k">:</span> <span class="kt">DStream</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span>
 <span class="k">val</span> <span class="n">stream2</span><span class="k">:</span> <span class="kt">DStream</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span>
-<span class="k">val</span> <span class="n">joinedStream</span> <span class="k">=</span> <span class="n">stream1</span><span class="o">.</span><span class="n">join</span><span class="o">(</span><span class="n">stream2</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">joinedStream</span> <span class="k">=</span> <span class="n">stream1</span><span class="o">.</span><span class="n">join</span><span class="o">(</span><span class="n">stream2</span><span class="o">)</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">stream1</span> <span class="o">=</span> <span class="o">...</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">stream1</span> <span class="o">=</span> <span class="o">...</span>
 <span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">stream2</span> <span class="o">=</span> <span class="o">...</span>
-<span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;&gt;</span> <span class="n">joinedStream</span> <span class="o">=</span> <span class="n">stream1</span><span class="o">.</span><span class="na">join</span><span class="o">(</span><span class="n">stream2</span><span class="o">);</span></code></pre></div>
+<span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;&gt;</span> <span class="n">joinedStream</span> <span class="o">=</span> <span class="n">stream1</span><span class="o">.</span><span class="na">join</span><span class="o">(</span><span class="n">stream2</span><span class="o">);</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">stream1</span> <span class="o">=</span> <span class="o">...</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">stream1</span> <span class="o">=</span> <span class="o">...</span>
 <span class="n">stream2</span> <span class="o">=</span> <span class="o">...</span>
-<span class="n">joinedStream</span> <span class="o">=</span> <span class="n">stream1</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">stream2</span><span class="p">)</span></code></pre></div>
+<span class="n">joinedStream</span> <span class="o">=</span> <span class="n">stream1</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">stream2</span><span class="p">)</span></code></pre></figure>
 
   </div>
 </div>
-<p>Here, in each batch interval, the RDD generated by <code>stream1</code> will be joined with the RDD generated by <code>stream2</code>. You can also do <code>leftOuterJoin</code>, <code>rightOuterJoin</code>, <code>fullOuterJoin</code>. Furthermore, it is often very useful to do joins over windows of the streams. That is pretty easy as well.</p>
+<p>Here, in each batch interval, the RDD generated by <code>stream1</code> will be joined with the RDD generated by <code>stream2</code>. You can also do <code>leftOuterJoin</code>, <code>rightOuterJoin</code>, <code>fullOuterJoin</code>. Furthermore, it is often very useful to do joins over windows of the streams. That is pretty easy as well. </p>
 
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">windowedStream1</span> <span class="k">=</span> <span class="n">stream1</span><span class="o">.</span><span class="n">window</span><span class="o">(</span><span class="nc">Seconds</span><span class="o">(</span><span class="mi">20</span><span class="o">))</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">windowedStream1</span> <span class="k">=</span> <span class="n">stream1</span><span class="o">.</span><span class="n">window</span><span class="o">(</span><span class="nc">Seconds</span><span class="o">(</span><span class="mi">20</span><span class="o">))</span>
 <span class="k">val</span> <span class="n">windowedStream2</span> <span class="k">=</span> <span class="n">stream2</span><span class="o">.</span><span class="n">window</span><span class="o">(</span><span class="nc">Minutes</span><span class="o">(</span><span class="mi">1</span><span class="o">))</span>
-<span class="k">val</span> <span class="n">joinedStream</span> <span class="k">=</span> <span class="n">windowedStream1</span><span class="o">.</span><span class="n">join</span><span class="o">(</span><span class="n">windowedStream2</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">joinedStream</span> <span class="k">=</span> <span class="n">windowedStream1</span><span class="o">.</span><span class="n">join</span><span class="o">(</span><span class="n">windowedStream2</span><span class="o">)</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">windowedStream1</span> <span class="o">=</span> <span class="n">stream1</span><span class="o">.</span><span class="na">window</span><span class="o">(</span><span class="n">Durations</span><span class="o">.</span><span class="na">seconds</span><span class="o">(</span><span class="mi">20</span><span class="o">));</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">windowedStream1</span> <span class="o">=</span> <span class="n">stream1</span><span class="o">.</span><span class="na">window</span><span class="o">(</span><span class="n">Durations</span><span class="o">.</span><span class="na">seconds</span><span class="o">(</span><span class="mi">20</span><span class="o">));</span>
 <span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">windowedStream2</span> <span class="o">=</span> <span class="n">stream2</span><span class="o">.</span><span class="na">window</span><span class="o">(</span><span class="n">Durations</span><span class="o">.</span><span class="na">minutes</span><span class="o">(</span><span class="mi">1</span><span class="o">));</span>
-<span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;&gt;</span> <span class="n">joinedStream</span> <span class="o">=</span> <span class="n">windowedStream1</span><span class="o">.</span><span class="na">join</span><span class="o">(</span><span class="n">windowedStream2</span><span class="o">);</span></code></pre></div>
+<span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;&gt;</span> <span class="n">joinedStream</span> <span class="o">=</span> <span class="n">windowedStream1</span><span class="o">.</span><span class="na">join</span><span class="o">(</span><span class="n">windowedStream2</span><span class="o">);</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">windowedStream1</span> <span class="o">=</span> <span class="n">stream1</span><span class="o">.</span><span class="n">window</span><span class="p">(</span><span class="mi">20</span><span class="p">)</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">windowedStream1</span> <span class="o">=</span> <span class="n">stream1</span><span class="o">.</span><span class="n">window</span><span class="p">(</span><span class="mi">20</span><span class="p">)</span>
 <span class="n">windowedStream2</span> <span class="o">=</span> <span class="n">stream2</span><span class="o">.</span><span class="n">window</span><span class="p">(</span><span class="mi">60</span><span class="p">)</span>
-<span class="n">joinedStream</span> <span class="o">=</span> <span class="n">windowedStream1</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">windowedStream2</span><span class="p">)</span></code></pre></div>
+<span class="n">joinedStream</span> <span class="o">=</span> <span class="n">windowedStream1</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">windowedStream2</span><span class="p">)</span></code></pre></figure>
 
   </div>
 </div>
@@ -1219,14 +1219,14 @@ said two parameters - <i>windowLength</i> and <i>slideInterval</i>.</p>
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">dataset</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">dataset</span><span class="k">:</span> <span class="kt">RDD</span><span class="o">[</span><span class="kt">String</span>, <span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="o">...</span>
 <span class="k">val</span> <span class="n">windowedStream</span> <span class="k">=</span> <span class="n">stream</span><span class="o">.</span><span class="n">window</span><span class="o">(</span><span class="nc">Seconds</span><span class="o">(</span><span class="mi">20</span><span class="o">))...</span>
-<span class="k">val</span> <span class="n">joinedStream</span> <span class="k">=</span> <span class="n">windowedStream</span><span class="o">.</span><span class="n">transform</span> <span class="o">{</span> <span class="n">rdd</span> <span class="k">=&gt;</span> <span class="n">rdd</span><span class="o">.</span><span class="n">join</span><span class="o">(</span><span class="n">dataset</span><span class="o">)</span> <span class="o">}</span></code></pre></div>
+<span class="k">val</span> <span class="n">joinedStream</span> <span class="k">=</span> <span class="n">windowedStream</span><span class="o">.</span><span class="n">transform</span> <span class="o">{</span> <span class="n">rdd</span> <span class="k">=&gt;</span> <span class="n">rdd</span><span class="o">.</span><span class="n">join</span><span class="o">(</span><span class="n">dataset</span><span class="o">)</span> <span class="o">}</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">JavaPairRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">dataset</span> <span class="o">=</span> <span class="o">...</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">JavaPairRDD</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">dataset</span> <span class="o">=</span> <span class="o">...</span>
 <span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">windowedStream</span> <span class="o">=</span> <span class="n">stream</span><span class="o">.</span><span class="na">window</span><span class="o">(</span><span class="n">Durations</span><span class="o">.</span><span class="na">seconds</span><span class="o">(</span><span class="mi">20</span><span class="o">));</span>
 <span class="n">JavaPairDStream</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;</span> <span class="n">joinedStream</span> <span class="o">=</span> <span class="n">windowedStream</span><span class="o">.</span><span class="na">transform</span><span class="o">(</span>
   <span class="k">new</span> <span class="n">Function</span><span class="o">&lt;</span><span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;&gt;,</span> <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Tuple2</span><span class="o">&lt;</span><span class="n">String</span><span class="o">,</span> <span class="n">String</span><span class="o">&gt;&gt;&gt;()</span> <span class="o">{</span>
@@ -1235,14 +1235,14 @@ said two parameters - <i>windowLength</i> and <i>slideInterval</i>.</p>
       <span class="k">return</span> <span class="n">rdd</span><span class="o">.</span><span class="na">join</span><span class="o">(</span><span class="n">dataset</span><span class="o">);</span>
     <span class="o">}</span>
   <span class="o">}</span>
-<span class="o">);</span></code></pre></div>
+<span class="o">);</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">dataset</span> <span class="o">=</span> <span class="o">...</span> <span class="c"># some RDD</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">dataset</span> <span class="o">=</span> <span class="o">...</span> <span class="c1"># some RDD</span>
 <span class="n">windowedStream</span> <span class="o">=</span> <span class="n">stream</span><span class="o">.</span><span class="n">window</span><span class="p">(</span><span class="mi">20</span><span class="p">)</span>
-<span class="n">joinedStream</span> <span class="o">=</span> <span class="n">windowedStream</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="k">lambda</span> <span class="n">rdd</span><span class="p">:</span> <span class="n">rdd</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">dataset</span><span class="p">))</span></code></pre></div>
+<span class="n">joinedStream</span> <span class="o">=</span> <span class="n">windowedStream</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="k">lambda</span> <span class="n">rdd</span><span class="p">:</span> <span class="n">rdd</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">dataset</span><span class="p">))</span></code></pre></figure>
 
   </div>
 </div>
@@ -1324,22 +1324,22 @@ For example (in Scala),</p>
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="n">dstream</span><span class="o">.</span><span class="n">foreachRDD</span> <span class="o">{</span> <span class="n">rdd</span> <span class="k">=&gt;</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="n">dstream</span><span class="o">.</span><span class="n">foreachRDD</span> <span class="o">{</span> <span class="n">rdd</span> <span class="k">=&gt;</span>
   <span class="k">val</span> <span class="n">connection</span> <span class="k">=</span> <span class="n">createNewConnection</span><span class="o">()</span>  <span class="c1">// executed at the driver</span>
   <span class="n">rdd</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="n">record</span> <span class="k">=&gt;</span>
     <span class="n">connection</span><span class="o">.</span><span class="n">send</span><span class="o">(</span><span class="n">record</span><span class="o">)</span> <span class="c1">// executed at the worker</span>
   <span class="o">}</span>
-<span class="o">}</span></code></pre></div>
+<span class="o">}</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">sendRecord</span><span class="p">(</span><span class="n">rdd</span><span class="p">):</span>
-    <span class="n">connection</span> <span class="o">=</span> <span class="n">createNewConnection</span><span class="p">()</span>  <span class="c"># executed at the driver</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="k">def</span> <span class="nf">sendRecord</span><span class="p">(</span><span class="n">rdd</span><span class="p">):</span>
+    <span class="n">connection</span> <span class="o">=</span> <span class="n">createNewConnection</span><span class="p">()</span>  <span class="c1"># executed at the driver</span>
     <span class="n">rdd</span><span class="o">.</span><span class="n">foreach</span><span class="p">(</span><span class="k">lambda</span> <span class="n">record</span><span class="p">:</span> <span class="n">connection</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">record</span><span class="p">))</span>
     <span class="n">connection</span><span class="o">.</span><span class="n">close</span><span class="p">()</span>
 
-<span class="n">dstream</span><span class="o">.</span><span class="n">foreachRDD</span><span class="p">(</span><span class="n">sendRecord</span><span class="p">)</span></code></pre></div>
+<span class="n">dstream</span><span class="

<TRUNCATED>

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[02/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/structured-streaming-programming-guide.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/structured-streaming-programming-guide.html b/site/docs/2.1.0/structured-streaming-programming-guide.html
index e54c101..3a1ac5f 100644
--- a/site/docs/2.1.0/structured-streaming-programming-guide.html
+++ b/site/docs/2.1.0/structured-streaming-programming-guide.html
@@ -127,45 +127,50 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#overview" id="markdown-toc-overview">Overview</a></li>
-  <li><a href="#quick-example" id="markdown-toc-quick-example">Quick Example</a></li>
-  <li><a href="#programming-model" id="markdown-toc-programming-model">Programming Model</a>    <ul>
-      <li><a href="#basic-concepts" id="markdown-toc-basic-concepts">Basic Concepts</a></li>
-      <li><a href="#handling-event-time-and-late-data" id="markdown-toc-handling-event-time-and-late-data">Handling Event-time and Late Data</a></li>
-      <li><a href="#fault-tolerance-semantics" id="markdown-toc-fault-tolerance-semantics">Fault Tolerance Semantics</a></li>
+  <li><a href="#overview">Overview</a></li>
+  <li><a href="#quick-example">Quick Example</a></li>
+  <li><a href="#programming-model">Programming Model</a>    <ul>
+      <li><a href="#basic-concepts">Basic Concepts</a></li>
+      <li><a href="#handling-event-time-and-late-data">Handling Event-time and Late Data</a></li>
+      <li><a href="#fault-tolerance-semantics">Fault Tolerance Semantics</a></li>
     </ul>
   </li>
-  <li><a href="#api-using-datasets-and-dataframes" id="markdown-toc-api-using-datasets-and-dataframes">API using Datasets and DataFrames</a>    <ul>
-      <li><a href="#creating-streaming-dataframes-and-streaming-datasets" id="markdown-toc-creating-streaming-dataframes-and-streaming-datasets">Creating streaming DataFrames and streaming Datasets</a>        <ul>
-          <li><a href="#data-sources" id="markdown-toc-data-sources">Data Sources</a></li>
-          <li><a href="#schema-inference-and-partition-of-streaming-dataframesdatasets" id="markdown-toc-schema-inference-and-partition-of-streaming-dataframesdatasets">Schema inference and partition of streaming DataFrames/Datasets</a></li>
+  <li><a href="#api-using-datasets-and-dataframes">API using Datasets and DataFrames</a>    <ul>
+      <li><a href="#creating-streaming-dataframes-and-streaming-datasets">Creating streaming DataFrames and streaming Datasets</a>        <ul>
+          <li><a href="#data-sources">Data Sources</a></li>
+          <li><a href="#schema-inference-and-partition-of-streaming-dataframesdatasets">Schema inference and partition of streaming DataFrames/Datasets</a></li>
         </ul>
       </li>
-      <li><a href="#operations-on-streaming-dataframesdatasets" id="markdown-toc-operations-on-streaming-dataframesdatasets">Operations on streaming DataFrames/Datasets</a>        <ul>
-          <li><a href="#basic-operations---selection-projection-aggregation" id="markdown-toc-basic-operations---selection-projection-aggregation">Basic Operations - Selection, Projection, Aggregation</a></li>
-          <li><a href="#window-operations-on-event-time" id="markdown-toc-window-operations-on-event-time">Window Operations on Event Time</a></li>
-          <li><a href="#join-operations" id="markdown-toc-join-operations">Join Operations</a></li>
-          <li><a href="#unsupported-operations" id="markdown-toc-unsupported-operations">Unsupported Operations</a></li>
+      <li><a href="#operations-on-streaming-dataframesdatasets">Operations on streaming DataFrames/Datasets</a>        <ul>
+          <li><a href="#basic-operations---selection-projection-aggregation">Basic Operations - Selection, Projection, Aggregation</a></li>
+          <li><a href="#window-operations-on-event-time">Window Operations on Event Time</a></li>
+          <li><a href="#handling-late-data-and-watermarking">Handling Late Data and Watermarking</a></li>
+          <li><a href="#join-operations">Join Operations</a></li>
+          <li><a href="#unsupported-operations">Unsupported Operations</a></li>
         </ul>
       </li>
-      <li><a href="#starting-streaming-queries" id="markdown-toc-starting-streaming-queries">Starting Streaming Queries</a>        <ul>
-          <li><a href="#output-modes" id="markdown-toc-output-modes">Output Modes</a></li>
-          <li><a href="#output-sinks" id="markdown-toc-output-sinks">Output Sinks</a></li>
-          <li><a href="#using-foreach" id="markdown-toc-using-foreach">Using Foreach</a></li>
+      <li><a href="#starting-streaming-queries">Starting Streaming Queries</a>        <ul>
+          <li><a href="#output-modes">Output Modes</a></li>
+          <li><a href="#output-sinks">Output Sinks</a></li>
+          <li><a href="#using-foreach">Using Foreach</a></li>
         </ul>
       </li>
-      <li><a href="#managing-streaming-queries" id="markdown-toc-managing-streaming-queries">Managing Streaming Queries</a></li>
-      <li><a href="#monitoring-streaming-queries" id="markdown-toc-monitoring-streaming-queries">Monitoring Streaming Queries</a></li>
-      <li><a href="#recovering-from-failures-with-checkpointing" id="markdown-toc-recovering-from-failures-with-checkpointing">Recovering from Failures with Checkpointing</a></li>
+      <li><a href="#managing-streaming-queries">Managing Streaming Queries</a></li>
+      <li><a href="#monitoring-streaming-queries">Monitoring Streaming Queries</a>        <ul>
+          <li><a href="#interactive-apis">Interactive APIs</a></li>
+          <li><a href="#asynchronous-api">Asynchronous API</a></li>
+        </ul>
+      </li>
+      <li><a href="#recovering-from-failures-with-checkpointing">Recovering from Failures with Checkpointing</a></li>
     </ul>
   </li>
-  <li><a href="#where-to-go-from-here" id="markdown-toc-where-to-go-from-here">Where to go from here</a></li>
+  <li><a href="#where-to-go-from-here">Where to go from here</a></li>
 </ul>
 
 <h1 id="overview">Overview</h1>
 <p>Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. You can express your streaming computation the same way you would express a batch computation on static data.The Spark SQL engine will take care of running it incrementally and continuously and updating the final result as streaming data continues to arrive. You can use the <a href="sql-programming-guide.html">Dataset/DataFrame API</a> in Scala, Java or Python to express streaming aggregations, event-time windows, stream-to-batch joins, etc. The computation is executed on the same optimized Spark SQL engine. Finally, the system ensures end-to-end exactly-once fault-tolerance guarantees through checkpointing and Write Ahead Logs. In short, <em>Structured Streaming provides fast, scalable, fault-tolerant, end-to-end exactly-once stream processing without the user having to reason about streaming.</em></p>
 
-<p><strong>Spark 2.0 is the ALPHA RELEASE of Structured Streaming</strong> and the APIs are still experimental. In this guide, we are going to walk you through the programming model and the APIs. First, let&#8217;s start with a simple example - a streaming word count.</p>
+<p><strong>Structured Streaming is still ALPHA in Spark 2.1</strong> and the APIs are still experimental. In this guide, we are going to walk you through the programming model and the APIs. First, let&#8217;s start with a simple example - a streaming word count. </p>
 
 <h1 id="quick-example">Quick Example</h1>
 <p>Let\u2019s say you want to maintain a running word count of text data received from a data server listening on a TCP socket. Let\u2019s see how you can express this using Structured Streaming. You can see the full code in 
@@ -175,7 +180,7 @@ And if you <a href="http://spark.apache.org/downloads.html">download Spark</a>,
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">org.apache.spark.sql.functions._</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">org.apache.spark.sql.functions._</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.sql.SparkSession</span>
 
 <span class="k">val</span> <span class="n">spark</span> <span class="k">=</span> <span class="nc">SparkSession</span>
@@ -183,12 +188,12 @@ And if you <a href="http://spark.apache.org/downloads.html">download Spark</a>,
   <span class="o">.</span><span class="n">appName</span><span class="o">(</span><span class="s">&quot;StructuredNetworkWordCount&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="n">getOrCreate</span><span class="o">()</span>
   
-<span class="k">import</span> <span class="nn">spark.implicits._</span></code></pre></div>
+<span class="k">import</span> <span class="nn">spark.implicits._</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.FlatMapFunction</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.FlatMapFunction</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.*</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.streaming.StreamingQuery</span><span class="o">;</span>
 
@@ -198,19 +203,19 @@ And if you <a href="http://spark.apache.org/downloads.html">download Spark</a>,
 <span class="n">SparkSession</span> <span class="n">spark</span> <span class="o">=</span> <span class="n">SparkSession</span>
   <span class="o">.</span><span class="na">builder</span><span class="o">()</span>
   <span class="o">.</span><span class="na">appName</span><span class="o">(</span><span class="s">&quot;JavaStructuredNetworkWordCount&quot;</span><span class="o">)</span>
-  <span class="o">.</span><span class="na">getOrCreate</span><span class="o">();</span></code></pre></div>
+  <span class="o">.</span><span class="na">getOrCreate</span><span class="o">();</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="kn">import</span> <span class="n">SparkSession</span>
 <span class="kn">from</span> <span class="nn">pyspark.sql.functions</span> <span class="kn">import</span> <span class="n">explode</span>
 <span class="kn">from</span> <span class="nn">pyspark.sql.functions</span> <span class="kn">import</span> <span class="n">split</span>
 
 <span class="n">spark</span> <span class="o">=</span> <span class="n">SparkSession</span> \
     <span class="o">.</span><span class="n">builder</span> \
-    <span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s">&quot;StructuredNetworkWordCount&quot;</span><span class="p">)</span> \
-    <span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span></code></pre></div>
+    <span class="o">.</span><span class="n">appName</span><span class="p">(</span><span class="s2">&quot;StructuredNetworkWordCount&quot;</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">getOrCreate</span><span class="p">()</span></code></pre></figure>
 
   </div>
 </div>
@@ -220,7 +225,7 @@ And if you <a href="http://spark.apache.org/downloads.html">download Spark</a>,
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Create DataFrame representing the stream of input lines from connection to localhost:9999</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// Create DataFrame representing the stream of input lines from connection to localhost:9999</span>
 <span class="k">val</span> <span class="n">lines</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">readStream</span>
   <span class="o">.</span><span class="n">format</span><span class="o">(</span><span class="s">&quot;socket&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="n">option</span><span class="o">(</span><span class="s">&quot;host&quot;</span><span class="o">,</span> <span class="s">&quot;localhost&quot;</span><span class="o">)</span>
@@ -231,14 +236,14 @@ And if you <a href="http://spark.apache.org/downloads.html">download Spark</a>,
 <span class="k">val</span> <span class="n">words</span> <span class="k">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">as</span><span class="o">[</span><span class="kt">String</span><span class="o">].</span><span class="n">flatMap</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">split</span><span class="o">(</span><span class="s">&quot; &quot;</span><span class="o">))</span>
 
 <span class="c1">// Generate running word count</span>
-<span class="k">val</span> <span class="n">wordCounts</span> <span class="k">=</span> <span class="n">words</span><span class="o">.</span><span class="n">groupBy</span><span class="o">(</span><span class="s">&quot;value&quot;</span><span class="o">).</span><span class="n">count</span><span class="o">()</span></code></pre></div>
+<span class="k">val</span> <span class="n">wordCounts</span> <span class="k">=</span> <span class="n">words</span><span class="o">.</span><span class="n">groupBy</span><span class="o">(</span><span class="s">&quot;value&quot;</span><span class="o">).</span><span class="n">count</span><span class="o">()</span></code></pre></figure>
 
     <p>This <code>lines</code> DataFrame represents an unbounded table containing the streaming text data. This table contains one column of strings named &#8220;value&#8221;, and each line in the streaming text data becomes a row in the table. Note, that this is not currently receiving any data as we are just setting up the transformation, and have not yet started it. Next, we have converted the DataFrame to a  Dataset of String using <code>.as[String]</code>, so that we can apply the <code>flatMap</code> operation to split each line into multiple words. The resultant <code>words</code> Dataset contains all the words. Finally, we have defined the <code>wordCounts</code> DataFrame by grouping by the unique values in the Dataset and counting them. Note that this is a streaming DataFrame which represents the running word counts of the stream.</p>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Create DataFrame representing the stream of input lines from connection to localhost:9999</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// Create DataFrame representing the stream of input lines from connection to localhost:9999</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">lines</span> <span class="o">=</span> <span class="n">spark</span>
   <span class="o">.</span><span class="na">readStream</span><span class="o">()</span>
   <span class="o">.</span><span class="na">format</span><span class="o">(</span><span class="s">&quot;socket&quot;</span><span class="o">)</span>
@@ -258,30 +263,30 @@ And if you <a href="http://spark.apache.org/downloads.html">download Spark</a>,
     <span class="o">},</span> <span class="n">Encoders</span><span class="o">.</span><span class="na">STRING</span><span class="o">());</span>
 
 <span class="c1">// Generate running word count</span>
-<span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">wordCounts</span> <span class="o">=</span> <span class="n">words</span><span class="o">.</span><span class="na">groupBy</span><span class="o">(</span><span class="s">&quot;value&quot;</span><span class="o">).</span><span class="na">count</span><span class="o">();</span></code></pre></div>
+<span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">wordCounts</span> <span class="o">=</span> <span class="n">words</span><span class="o">.</span><span class="na">groupBy</span><span class="o">(</span><span class="s">&quot;value&quot;</span><span class="o">).</span><span class="na">count</span><span class="o">();</span></code></pre></figure>
 
     <p>This <code>lines</code> DataFrame represents an unbounded table containing the streaming text data. This table contains one column of strings named &#8220;value&#8221;, and each line in the streaming text data becomes a row in the table. Note, that this is not currently receiving any data as we are just setting up the transformation, and have not yet started it. Next, we have converted the DataFrame to a  Dataset of String using <code>.as(Encoders.STRING())</code>, so that we can apply the <code>flatMap</code> operation to split each line into multiple words. The resultant <code>words</code> Dataset contains all the words. Finally, we have defined the <code>wordCounts</code> DataFrame by grouping by the unique values in the Dataset and counting them. Note that this is a streaming DataFrame which represents the running word counts of the stream.</p>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># Create DataFrame representing the stream of input lines from connection to localhost:9999</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="c1"># Create DataFrame representing the stream of input lines from connection to localhost:9999</span>
 <span class="n">lines</span> <span class="o">=</span> <span class="n">spark</span> \
     <span class="o">.</span><span class="n">readStream</span> \
-    <span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;socket&quot;</span><span class="p">)</span> \
-    <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s">&quot;host&quot;</span><span class="p">,</span> <span class="s">&quot;localhost&quot;</span><span class="p">)</span> \
-    <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s">&quot;port&quot;</span><span class="p">,</span> <span class="mi">9999</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;socket&quot;</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;host&quot;</span><span class="p">,</span> <span class="s2">&quot;localhost&quot;</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;port&quot;</span><span class="p">,</span> <span class="mi">9999</span><span class="p">)</span> \
     <span class="o">.</span><span class="n">load</span><span class="p">()</span>
 
-<span class="c"># Split the lines into words</span>
+<span class="c1"># Split the lines into words</span>
 <span class="n">words</span> <span class="o">=</span> <span class="n">lines</span><span class="o">.</span><span class="n">select</span><span class="p">(</span>
    <span class="n">explode</span><span class="p">(</span>
-       <span class="n">split</span><span class="p">(</span><span class="n">lines</span><span class="o">.</span><span class="n">value</span><span class="p">,</span> <span class="s">&quot; &quot;</span><span class="p">)</span>
-   <span class="p">)</span><span class="o">.</span><span class="n">alias</span><span class="p">(</span><span class="s">&quot;word&quot;</span><span class="p">)</span>
+       <span class="n">split</span><span class="p">(</span><span class="n">lines</span><span class="o">.</span><span class="n">value</span><span class="p">,</span> <span class="s2">&quot; &quot;</span><span class="p">)</span>
+   <span class="p">)</span><span class="o">.</span><span class="n">alias</span><span class="p">(</span><span class="s2">&quot;word&quot;</span><span class="p">)</span>
 <span class="p">)</span>
 
-<span class="c"># Generate running word count</span>
-<span class="n">wordCounts</span> <span class="o">=</span> <span class="n">words</span><span class="o">.</span><span class="n">groupBy</span><span class="p">(</span><span class="s">&quot;word&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span></code></pre></div>
+<span class="c1"># Generate running word count</span>
+<span class="n">wordCounts</span> <span class="o">=</span> <span class="n">words</span><span class="o">.</span><span class="n">groupBy</span><span class="p">(</span><span class="s2">&quot;word&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span></code></pre></figure>
 
     <p>This <code>lines</code> DataFrame represents an unbounded table containing the streaming text data. This table contains one column of strings named &#8220;value&#8221;, and each line in the streaming text data becomes a row in the table. Note, that this is not currently receiving any data as we are just setting up the transformation, and have not yet started it. Next, we have used two built-in SQL functions - split and explode, to split each line into multiple rows with a word each. In addition, we use the function <code>alias</code> to name the new column as &#8220;word&#8221;. Finally, we have defined the <code>wordCounts</code> DataFrame by grouping by the unique values in the Dataset and counting them. Note that this is a streaming DataFrame which represents the running word counts of the stream.</p>
 
@@ -293,36 +298,36 @@ And if you <a href="http://spark.apache.org/downloads.html">download Spark</a>,
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// Start running the query that prints the running counts to the console</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// Start running the query that prints the running counts to the console</span>
 <span class="k">val</span> <span class="n">query</span> <span class="k">=</span> <span class="n">wordCounts</span><span class="o">.</span><span class="n">writeStream</span>
   <span class="o">.</span><span class="n">outputMode</span><span class="o">(</span><span class="s">&quot;complete&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="n">format</span><span class="o">(</span><span class="s">&quot;console&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="n">start</span><span class="o">()</span>
 
-<span class="n">query</span><span class="o">.</span><span class="n">awaitTermination</span><span class="o">()</span></code></pre></div>
+<span class="n">query</span><span class="o">.</span><span class="n">awaitTermination</span><span class="o">()</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// Start running the query that prints the running counts to the console</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// Start running the query that prints the running counts to the console</span>
 <span class="n">StreamingQuery</span> <span class="n">query</span> <span class="o">=</span> <span class="n">wordCounts</span><span class="o">.</span><span class="na">writeStream</span><span class="o">()</span>
   <span class="o">.</span><span class="na">outputMode</span><span class="o">(</span><span class="s">&quot;complete&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">format</span><span class="o">(</span><span class="s">&quot;console&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">start</span><span class="o">();</span>
 
-<span class="n">query</span><span class="o">.</span><span class="na">awaitTermination</span><span class="o">();</span></code></pre></div>
+<span class="n">query</span><span class="o">.</span><span class="na">awaitTermination</span><span class="o">();</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># Start running the query that prints the running counts to the console</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span> <span class="c1"># Start running the query that prints the running counts to the console</span>
 <span class="n">query</span> <span class="o">=</span> <span class="n">wordCounts</span> \
     <span class="o">.</span><span class="n">writeStream</span> \
-    <span class="o">.</span><span class="n">outputMode</span><span class="p">(</span><span class="s">&quot;complete&quot;</span><span class="p">)</span> \
-    <span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;console&quot;</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">outputMode</span><span class="p">(</span><span class="s2">&quot;complete&quot;</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;console&quot;</span><span class="p">)</span> \
     <span class="o">.</span><span class="n">start</span><span class="p">()</span>
 
-<span class="n">query</span><span class="o">.</span><span class="n">awaitTermination</span><span class="p">()</span></code></pre></div>
+<span class="n">query</span><span class="o">.</span><span class="n">awaitTermination</span><span class="p">()</span></code></pre></figure>
 
   </div>
 </div>
@@ -341,17 +346,17 @@ And if you <a href="http://spark.apache.org/downloads.html">download Spark</a>,
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./bin/run-example org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount localhost 9999</code></pre></div>
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ ./bin/run-example org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount localhost <span class="m">9999</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount localhost 9999</code></pre></div>
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ ./bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount localhost <span class="m">9999</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="nv">$ </span>./bin/spark-submit examples/src/main/python/sql/streaming/structured_network_wordcount.py localhost 9999</code></pre></div>
+    <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>$ ./bin/spark-submit examples/src/main/python/sql/streaming/structured_network_wordcount.py localhost <span class="m">9999</span></code></pre></figure>
 
   </div>
 </div>
@@ -361,10 +366,10 @@ And if you <a href="http://spark.apache.org/downloads.html">download Spark</a>,
 <table width="100%">
     <td>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># TERMINAL 1:</span>
-<span class="c"># Running Netcat</span>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="c1"># TERMINAL 1:</span>
+<span class="c1"># Running Netcat</span>
 
-<span class="nv">$ </span>nc -lk 9999
+$ nc -lk <span class="m">9999</span>
 apache spark
 apache hadoop
 
@@ -386,7 +391,7 @@ apache hadoop
 
 
 
-...</code></pre></div>
+...</code></pre></figure>
 
     </td>
     <td width="2%"></td>
@@ -395,90 +400,90 @@ apache hadoop
 
 <div data-lang="scala">
 
-        <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># TERMINAL 2: RUNNING StructuredNetworkWordCount</span>
+        <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="c1"># TERMINAL 2: RUNNING StructuredNetworkWordCount</span>
 
-<span class="nv">$ </span>./bin/run-example org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount localhost 9999
+$ ./bin/run-example org.apache.spark.examples.sql.streaming.StructuredNetworkWordCount localhost <span class="m">9999</span>
 
 -------------------------------------------
-Batch: 0
+Batch: <span class="m">0</span>
 -------------------------------------------
 +------+-----+
 <span class="p">|</span> value<span class="p">|</span>count<span class="p">|</span>
 +------+-----+
-<span class="p">|</span>apache<span class="p">|</span>    1<span class="p">|</span>
-<span class="p">|</span> spark<span class="p">|</span>    1<span class="p">|</span>
+<span class="p">|</span>apache<span class="p">|</span>    <span class="m">1</span><span class="p">|</span>
+<span class="p">|</span> spark<span class="p">|</span>    <span class="m">1</span><span class="p">|</span>
 +------+-----+
 
 -------------------------------------------
-Batch: 1
+Batch: <span class="m">1</span>
 -------------------------------------------
 +------+-----+
 <span class="p">|</span> value<span class="p">|</span>count<span class="p">|</span>
 +------+-----+
-<span class="p">|</span>apache<span class="p">|</span>    2<span class="p">|</span>
-<span class="p">|</span> spark<span class="p">|</span>    1<span class="p">|</span>
-<span class="p">|</span>hadoop<span class="p">|</span>    1<span class="p">|</span>
+<span class="p">|</span>apache<span class="p">|</span>    <span class="m">2</span><span class="p">|</span>
+<span class="p">|</span> spark<span class="p">|</span>    <span class="m">1</span><span class="p">|</span>
+<span class="p">|</span>hadoop<span class="p">|</span>    <span class="m">1</span><span class="p">|</span>
 +------+-----+
-...</code></pre></div>
+...</code></pre></figure>
 
       </div>
 
 <div data-lang="java">
 
-        <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># TERMINAL 2: RUNNING JavaStructuredNetworkWordCount</span>
+        <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="c1"># TERMINAL 2: RUNNING JavaStructuredNetworkWordCount</span>
 
-<span class="nv">$ </span>./bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount localhost 9999
+$ ./bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount localhost <span class="m">9999</span>
 
 -------------------------------------------
-Batch: 0
+Batch: <span class="m">0</span>
 -------------------------------------------
 +------+-----+
 <span class="p">|</span> value<span class="p">|</span>count<span class="p">|</span>
 +------+-----+
-<span class="p">|</span>apache<span class="p">|</span>    1<span class="p">|</span>
-<span class="p">|</span> spark<span class="p">|</span>    1<span class="p">|</span>
+<span class="p">|</span>apache<span class="p">|</span>    <span class="m">1</span><span class="p">|</span>
+<span class="p">|</span> spark<span class="p">|</span>    <span class="m">1</span><span class="p">|</span>
 +------+-----+
 
 -------------------------------------------
-Batch: 1
+Batch: <span class="m">1</span>
 -------------------------------------------
 +------+-----+
 <span class="p">|</span> value<span class="p">|</span>count<span class="p">|</span>
 +------+-----+
-<span class="p">|</span>apache<span class="p">|</span>    2<span class="p">|</span>
-<span class="p">|</span> spark<span class="p">|</span>    1<span class="p">|</span>
-<span class="p">|</span>hadoop<span class="p">|</span>    1<span class="p">|</span>
+<span class="p">|</span>apache<span class="p">|</span>    <span class="m">2</span><span class="p">|</span>
+<span class="p">|</span> spark<span class="p">|</span>    <span class="m">1</span><span class="p">|</span>
+<span class="p">|</span>hadoop<span class="p">|</span>    <span class="m">1</span><span class="p">|</span>
 +------+-----+
-...</code></pre></div>
+...</code></pre></figure>
 
       </div>
 <div data-lang="python">
 
-        <div class="highlight"><pre><code class="language-bash" data-lang="bash"><span class="c"># TERMINAL 2: RUNNING structured_network_wordcount.py</span>
+        <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span><span class="c1"># TERMINAL 2: RUNNING structured_network_wordcount.py</span>
 
-<span class="nv">$ </span>./bin/spark-submit examples/src/main/python/sql/streaming/structured_network_wordcount.py localhost 9999
+$ ./bin/spark-submit examples/src/main/python/sql/streaming/structured_network_wordcount.py localhost <span class="m">9999</span>
 
 -------------------------------------------
-Batch: 0
+Batch: <span class="m">0</span>
 -------------------------------------------
 +------+-----+
 <span class="p">|</span> value<span class="p">|</span>count<span class="p">|</span>
 +------+-----+
-<span class="p">|</span>apache<span class="p">|</span>    1<span class="p">|</span>
-<span class="p">|</span> spark<span class="p">|</span>    1<span class="p">|</span>
+<span class="p">|</span>apache<span class="p">|</span>    <span class="m">1</span><span class="p">|</span>
+<span class="p">|</span> spark<span class="p">|</span>    <span class="m">1</span><span class="p">|</span>
 +------+-----+
 
 -------------------------------------------
-Batch: 1
+Batch: <span class="m">1</span>
 -------------------------------------------
 +------+-----+
 <span class="p">|</span> value<span class="p">|</span>count<span class="p">|</span>
 +------+-----+
-<span class="p">|</span>apache<span class="p">|</span>    2<span class="p">|</span>
-<span class="p">|</span> spark<span class="p">|</span>    1<span class="p">|</span>
-<span class="p">|</span>hadoop<span class="p">|</span>    1<span class="p">|</span>
+<span class="p">|</span>apache<span class="p">|</span>    <span class="m">2</span><span class="p">|</span>
+<span class="p">|</span> spark<span class="p">|</span>    <span class="m">1</span><span class="p">|</span>
+<span class="p">|</span>hadoop<span class="p">|</span>    <span class="m">1</span><span class="p">|</span>
 +------+-----+
-...</code></pre></div>
+...</code></pre></figure>
 
       </div>
 </div>
@@ -500,15 +505,15 @@ arriving on the stream is like a new row being appended to the Input Table.</p>
 
 <p><img src="img/structured-streaming-stream-as-a-table.png" alt="Stream as a Table" title="Stream as a Table" /></p>
 
-<p>A query on the input will generate the &#8220;Result Table&#8221;. Every trigger interval (say, every 1 second), new rows get appended to the Input Table, which eventually updates the Result Table. Whenever the result table gets updated, we would want to write the changed result rows to an external sink.</p>
+<p>A query on the input will generate the &#8220;Result Table&#8221;. Every trigger interval (say, every 1 second), new rows get appended to the Input Table, which eventually updates the Result Table. Whenever the result table gets updated, we would want to write the changed result rows to an external sink. </p>
 
 <p><img src="img/structured-streaming-model.png" alt="Model" /></p>
 
-<p>The &#8220;Output&#8221; is defined as what gets written out to the external storage. The output can be defined in different modes</p>
+<p>The &#8220;Output&#8221; is defined as what gets written out to the external storage. The output can be defined in different modes </p>
 
 <ul>
   <li>
-    <p><em>Complete Mode</em> - The entire updated Result Table will be written to the external storage. It is up to the storage connector to decide how to handle writing of the entire table.</p>
+    <p><em>Complete Mode</em> - The entire updated Result Table will be written to the external storage. It is up to the storage connector to decide how to handle writing of the entire table. </p>
   </li>
   <li>
     <p><em>Append Mode</em> - Only the new rows appended in the Result Table since the last trigger will be written to the external storage. This is applicable only on the queries where existing rows in the Result Table are not expected to change.</p>
@@ -542,7 +547,14 @@ see how this model handles event-time based processing and late arriving data.</
 <h2 id="handling-event-time-and-late-data">Handling Event-time and Late Data</h2>
 <p>Event-time is the time embedded in the data itself. For many applications, you may want to operate on this event-time. For example, if you want to get the number of events generated by IoT devices every minute, then you probably want to use the time when the data was generated (that is, event-time in the data), rather than the time Spark receives them. This event-time is very naturally expressed in this model &#8211; each event from the devices is a row in the table, and event-time is a column value in the row. This allows window-based aggregations (e.g. number of events every minute) to be just a special type of grouping and aggregation on the even-time column &#8211; each time window is a group and each row can belong to multiple windows/groups. Therefore, such event-time-window-based aggregation queries can be defined consistently on both a static dataset (e.g. from collected device events logs) as well as on a data stream, making the life of the user much easier.</p>
 
-<p>Furthermore, this model naturally handles data that has arrived later than expected based on its event-time. Since Spark is updating the Result Table, it has full control over updating/cleaning up the aggregates when there is late data. While not yet implemented in Spark 2.0, event-time watermarking will be used to manage this data. These are explained later in more details in the <a href="#window-operations-on-event-time">Window Operations</a> section.</p>
+<p>Furthermore, this model naturally handles data that has arrived later than 
+expected based on its event-time. Since Spark is updating the Result Table, 
+it has full control over updating old aggregates when there is late data, 
+as well as cleaning up old aggregates to limit the size of intermediate
+state data. Since Spark 2.1, we have support for watermarking which 
+allows the user to specify the threshold of late data, and allows the engine
+to accordingly clean up old state. These are explained later in more 
+details in the <a href="#window-operations-on-event-time">Window Operations</a> section.</p>
 
 <h2 id="fault-tolerance-semantics">Fault Tolerance Semantics</h2>
 <p>Delivering end-to-end exactly-once semantics was one of key goals behind the design of Structured Streaming. To achieve that, we have designed the Structured Streaming sources, the sinks and the execution engine to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing. Every streaming source is assumed to have offsets (similar to Kafka offsets, or Kinesis sequence numbers)
@@ -570,7 +582,7 @@ returned by <code>SparkSession.readStream()</code>. Similar to the read interfac
     <p><strong>Kafka source</strong> - Poll data from Kafka. It&#8217;s compatible with Kafka broker versions 0.10.0 or higher. See the <a href="structured-streaming-kafka-integration.html">Kafka Integration Guide</a> for more details.</p>
   </li>
   <li>
-    <p><strong>Socket source (for testing)</strong> - Reads UTF8 text data from a socket connection. The listening server socket is at the driver. Note that this should be used only for testing as this does not provide end-to-end fault-tolerance guarantees.</p>
+    <p><strong>Socket source (for testing)</strong> - Reads UTF8 text data from a socket connection. The listening server socket is at the driver. Note that this should be used only for testing as this does not provide end-to-end fault-tolerance guarantees. </p>
   </li>
 </ul>
 
@@ -579,7 +591,7 @@ returned by <code>SparkSession.readStream()</code>. Similar to the read interfac
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">spark</span><span class="k">:</span> <span class="kt">SparkSession</span> <span class="o">=</span> <span class="o">...</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">spark</span><span class="k">:</span> <span class="kt">SparkSession</span> <span class="o">=</span> <span class="o">...</span>
 
 <span class="c1">// Read text from socket </span>
 <span class="k">val</span> <span class="n">socketDF</span> <span class="k">=</span> <span class="n">spark</span>
@@ -599,12 +611,12 @@ returned by <code>SparkSession.readStream()</code>. Similar to the read interfac
   <span class="o">.</span><span class="n">readStream</span>
   <span class="o">.</span><span class="n">option</span><span class="o">(</span><span class="s">&quot;sep&quot;</span><span class="o">,</span> <span class="s">&quot;;&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="n">schema</span><span class="o">(</span><span class="n">userSchema</span><span class="o">)</span>      <span class="c1">// Specify schema of the csv files</span>
-  <span class="o">.</span><span class="n">csv</span><span class="o">(</span><span class="s">&quot;/path/to/directory&quot;</span><span class="o">)</span>    <span class="c1">// Equivalent to format(&quot;csv&quot;).load(&quot;/path/to/directory&quot;)</span></code></pre></div>
+  <span class="o">.</span><span class="n">csv</span><span class="o">(</span><span class="s">&quot;/path/to/directory&quot;</span><span class="o">)</span>    <span class="c1">// Equivalent to format(&quot;csv&quot;).load(&quot;/path/to/directory&quot;)</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">SparkSession</span> <span class="n">spark</span> <span class="o">=</span> <span class="o">...</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">SparkSession</span> <span class="n">spark</span> <span class="o">=</span> <span class="o">...</span>
 
 <span class="c1">// Read text from socket </span>
 <span class="n">Dataset</span><span class="o">[</span><span class="n">Row</span><span class="o">]</span> <span class="n">socketDF</span> <span class="o">=</span> <span class="n">spark</span>
@@ -619,37 +631,37 @@ returned by <code>SparkSession.readStream()</code>. Similar to the read interfac
 <span class="n">socketDF</span><span class="o">.</span><span class="na">printSchema</span><span class="o">();</span>
 
 <span class="c1">// Read all the csv files written atomically in a directory</span>
-<span class="n">StructType</span> <span class="n">userSchema</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">StructType</span><span class="o">().</span><span class="na">add</span><span class="o">(</span><span class="s">&quot;name&quot;</span><span class="o">,</span> <span class="s">&quot;string&quot;</span><span class="o">).</span><span class="na">add</span><span class="o">(</span><span class="s">&quot;age&quot;</span><span class="o">,</span> <span class="s">&quot;integer&quot;</span><span class="o">);</span>
+<span class="n">StructType</span> <span class="n">userSchema</span> <span class="o">=</span> <span class="k">new</span> <span class="n">StructType</span><span class="o">().</span><span class="na">add</span><span class="o">(</span><span class="s">&quot;name&quot;</span><span class="o">,</span> <span class="s">&quot;string&quot;</span><span class="o">).</span><span class="na">add</span><span class="o">(</span><span class="s">&quot;age&quot;</span><span class="o">,</span> <span class="s">&quot;integer&quot;</span><span class="o">);</span>
 <span class="n">Dataset</span><span class="o">[</span><span class="n">Row</span><span class="o">]</span> <span class="n">csvDF</span> <span class="o">=</span> <span class="n">spark</span>
   <span class="o">.</span><span class="na">readStream</span><span class="o">()</span>
   <span class="o">.</span><span class="na">option</span><span class="o">(</span><span class="s">&quot;sep&quot;</span><span class="o">,</span> <span class="s">&quot;;&quot;</span><span class="o">)</span>
   <span class="o">.</span><span class="na">schema</span><span class="o">(</span><span class="n">userSchema</span><span class="o">)</span>      <span class="c1">// Specify schema of the csv files</span>
-  <span class="o">.</span><span class="na">csv</span><span class="o">(</span><span class="s">&quot;/path/to/directory&quot;</span><span class="o">);</span>    <span class="c1">// Equivalent to format(&quot;csv&quot;).load(&quot;/path/to/directory&quot;)</span></code></pre></div>
+  <span class="o">.</span><span class="na">csv</span><span class="o">(</span><span class="s">&quot;/path/to/directory&quot;</span><span class="o">);</span>    <span class="c1">// Equivalent to format(&quot;csv&quot;).load(&quot;/path/to/directory&quot;)</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">spark</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="o">.</span> <span class="o">...</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">spark</span> <span class="o">=</span> <span class="n">SparkSession</span><span class="o">.</span> <span class="o">...</span>
 
-<span class="c"># Read text from socket </span>
+<span class="c1"># Read text from socket </span>
 <span class="n">socketDF</span> <span class="o">=</span> <span class="n">spark</span> \
     <span class="o">.</span><span class="n">readStream</span><span class="p">()</span> \
-    <span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s">&quot;socket&quot;</span><span class="p">)</span> \
-    <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s">&quot;host&quot;</span><span class="p">,</span> <span class="s">&quot;localhost&quot;</span><span class="p">)</span> \
-    <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s">&quot;port&quot;</span><span class="p">,</span> <span class="mi">9999</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="s2">&quot;socket&quot;</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;host&quot;</span><span class="p">,</span> <span class="s2">&quot;localhost&quot;</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;port&quot;</span><span class="p">,</span> <span class="mi">9999</span><span class="p">)</span> \
     <span class="o">.</span><span class="n">load</span><span class="p">()</span>
 
-<span class="n">socketDF</span><span class="o">.</span><span class="n">isStreaming</span><span class="p">()</span>    <span class="c"># Returns True for DataFrames that have streaming sources</span>
+<span class="n">socketDF</span><span class="o">.</span><span class="n">isStreaming</span><span class="p">()</span>    <span class="c1"># Returns True for DataFrames that have streaming sources</span>
 
 <span class="n">socketDF</span><span class="o">.</span><span class="n">printSchema</span><span class="p">()</span> 
 
-<span class="c"># Read all the csv files written atomically in a directory</span>
-<span class="n">userSchema</span> <span class="o">=</span> <span class="n">StructType</span><span class="p">()</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s">&quot;name&quot;</span><span class="p">,</span> <span class="s">&quot;string&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s">&quot;age&quot;</span><span class="p">,</span> <span class="s">&quot;integer&quot;</span><span class="p">)</span>
+<span class="c1"># Read all the csv files written atomically in a directory</span>
+<span class="n">userSchema</span> <span class="o">=</span> <span class="n">StructType</span><span class="p">()</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s2">&quot;name&quot;</span><span class="p">,</span> <span class="s2">&quot;string&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="s2">&quot;age&quot;</span><span class="p">,</span> <span class="s2">&quot;integer&quot;</span><span class="p">)</span>
 <span class="n">csvDF</span> <span class="o">=</span> <span class="n">spark</span> \
     <span class="o">.</span><span class="n">readStream</span><span class="p">()</span> \
-    <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s">&quot;sep&quot;</span><span class="p">,</span> <span class="s">&quot;;&quot;</span><span class="p">)</span> \
+    <span class="o">.</span><span class="n">option</span><span class="p">(</span><span class="s2">&quot;sep&quot;</span><span class="p">,</span> <span class="s2">&quot;;&quot;</span><span class="p">)</span> \
     <span class="o">.</span><span class="n">schema</span><span class="p">(</span><span class="n">userSchema</span><span class="p">)</span> \
-    <span class="o">.</span><span class="n">csv</span><span class="p">(</span><span class="s">&quot;/path/to/directory&quot;</span><span class="p">)</span>  <span class="c"># Equivalent to format(&quot;csv&quot;).load(&quot;/path/to/directory&quot;)</span></code></pre></div>
+    <span class="o">.</span><span class="n">csv</span><span class="p">(</span><span class="s2">&quot;/path/to/directory&quot;</span><span class="p">)</span>  <span class="c1"># Equivalent to format(&quot;csv&quot;).load(&quot;/path/to/directory&quot;)</span></code></pre></figure>
 
   </div>
 </div>
@@ -671,7 +683,7 @@ returned by <code>SparkSession.readStream()</code>. Similar to the read interfac
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">case</span> <span class="k">class</span> <span class="nc">DeviceData</span><span class="o">(</span><span class="n">device</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">type</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">signal</span><span class="k">:</span> <span class="kt">Double</span><span class="o">,</span> <span class="n">time</span><span class="k">:</span> <span class="kt">DateTime</span><span class="o">)</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">case</span> <span class="k">class</span> <span class="nc">DeviceData</span><span class="o">(</span><span class="n">device</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">type</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">signal</span><span class="k">:</span> <span class="kt">Double</span><span class="o">,</span> <span class="n">time</span><span class="k">:</span> <span class="kt">DateTime</span><span class="o">)</span>
 
 <span class="k">val</span> <span class="n">df</span><span class="k">:</span> <span class="kt">DataFrame</span> <span class="o">=</span> <span class="o">...</span> <span class="c1">// streaming DataFrame with IOT device data with schema { device: string, type: string, signal: double, time: string }</span>
 <span class="k">val</span> <span class="n">ds</span><span class="k">:</span> <span class="kt">Dataset</span><span class="o">[</span><span class="kt">DeviceData</span><span class="o">]</span> <span class="k">=</span> <span class="n">df</span><span class="o">.</span><span class="n">as</span><span class="o">[</span><span class="kt">DeviceData</span><span class="o">]</span>    <span class="c1">// streaming Dataset with IOT device data</span>
@@ -685,12 +697,12 @@ returned by <code>SparkSession.readStream()</code>. Similar to the read interfac
 
 <span class="c1">// Running average signal for each device type</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.sql.expressions.scalalang.typed._</span>
-<span class="n">ds</span><span class="o">.</span><span class="n">groupByKey</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">type</span><span class="o">).</span><span class="n">agg</span><span class="o">(</span><span class="n">typed</span><span class="o">.</span><span class="n">avg</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">signal</span><span class="o">))</span>    <span class="c1">// using typed API</span></code></pre></div>
+<span class="n">ds</span><span class="o">.</span><span class="n">groupByKey</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">type</span><span class="o">).</span><span class="n">agg</span><span class="o">(</span><span class="n">typed</span><span class="o">.</span><span class="n">avg</span><span class="o">(</span><span class="k">_</span><span class="o">.</span><span class="n">signal</span><span class="o">))</span>    <span class="c1">// using typed API</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.*</span><span class="o">;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.*</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.*</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.expressions.javalang.typed</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.sql.catalyst.encoders.ExpressionEncoder</span><span class="o">;</span>
@@ -735,24 +747,24 @@ returned by <code>SparkSession.readStream()</code>. Similar to the read interfac
   <span class="kd">public</span> <span class="n">Double</span> <span class="nf">call</span><span class="o">(</span><span class="n">DeviceData</span> <span class="n">value</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">Exception</span> <span class="o">{</span>
     <span class="k">return</span> <span class="n">value</span><span class="o">.</span><span class="na">getSignal</span><span class="o">();</span>
   <span class="o">}</span>
-<span class="o">}));</span></code></pre></div>
+<span class="o">}));</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">df</span> <span class="o">=</span> <span class="o">...</span>  <span class="c"># streaming DataFrame with IOT device data with schema { device: string, type: string, signal: double, time: DateType }</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">df</span> <span class="o">=</span> <span class="o">...</span>  <span class="c1"># streaming DataFrame with IOT device data with schema { device: string, type: string, signal: double, time: DateType }</span>
 
-<span class="c"># Select the devices which have signal more than 10</span>
-<span class="n">df</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s">&quot;device&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="s">&quot;signal &gt; 10&quot;</span><span class="p">)</span>                              
+<span class="c1"># Select the devices which have signal more than 10</span>
+<span class="n">df</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">&quot;device&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">where</span><span class="p">(</span><span class="s2">&quot;signal &gt; 10&quot;</span><span class="p">)</span>                              
 
-<span class="c"># Running count of the number of updates for each device type</span>
-<span class="n">df</span><span class="o">.</span><span class="n">groupBy</span><span class="p">(</span><span class="s">&quot;type&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span></code></pre></div>
+<span class="c1"># Running count of the number of updates for each device type</span>
+<span class="n">df</span><span class="o">.</span><span class="n">groupBy</span><span class="p">(</span><span class="s2">&quot;type&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span></code></pre></figure>
 
   </div>
 </div>
 
 <h3 id="window-operations-on-event-time">Window Operations on Event Time</h3>
-<p>Aggregations over a sliding event-time window are straightforward with Structured Streaming. The key idea to understand about window-based aggregations are very similar to grouped aggregations. In a grouped aggregation, aggregate values (e.g. counts) are maintained for each unique value in the user-specified grouping column. In case of window-based aggregations, aggregate values are maintained for each window the event-time of a row falls into. Let&#8217;s understand this with an illustration.</p>
+<p>Aggregations over a sliding event-time window are straightforward with Structured Streaming. The key idea to understand about window-based aggregations are very similar to grouped aggregations. In a grouped aggregation, aggregate values (e.g. counts) are maintained for each unique value in the user-specified grouping column. In case of window-based aggregations, aggregate values are maintained for each window the event-time of a row falls into. Let&#8217;s understand this with an illustration. </p>
 
 <p>Imagine our <a href="#quick-example">quick example</a> is modified and the stream now contains lines along with the time when the line was generated. Instead of running word counts, we want to count words within 10 minute windows, updating every 5 minutes. That is, word counts in words received between 10 minute windows 12:00 - 12:10, 12:05 - 12:15, 12:10 - 12:20, etc. Note that 12:00 - 12:10 means data that arrived after 12:00 but before 12:10. Now, consider a word that was received at 12:07. This word should increment the counts corresponding to two windows 12:00 - 12:10 and 12:05 - 12:15. So the counts will be indexed by both, the grouping key (i.e. the word) and the window (can be calculated from the event-time).</p>
 
@@ -766,7 +778,7 @@ returned by <code>SparkSession.readStream()</code>. Similar to the read interfac
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">import</span> <span class="nn">spark.implicits._</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">spark.implicits._</span>
 
 <span class="k">val</span> <span class="n">words</span> <span class="k">=</span> <span class="o">...</span> <span class="c1">// streaming DataFrame of schema { timestamp: Timestamp, word: String }</span>
 
@@ -774,66 +786,178 @@ returned by <code>SparkSession.readStream()</code>. Similar to the read interfac
 <span class="k">val</span> <span class="n">windowedCounts</span> <span class="k">=</span> <span class="n">words</span><span class="o">.</span><span class="n">groupBy</span><span class="o">(</span>
   <span class="n">window</span><span class="o">(</span><span class="n">$</span><span class="s">&quot;timestamp&quot;</span><span class="o">,</span> <span class="s">&quot;10 minutes&quot;</span><span class="o">,</span> <span class="s">&quot;5 minutes&quot;</span><span class="o">),</span>
   <span class="n">$</span><span class="s">&quot;word&quot;</span>
-<span class="o">).</span><span class="n">count</span><span class="o">()</span></code></pre></div>
+<span class="o">).</span><span class="n">count</span><span class="o">()</span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">words</span> <span class="o">=</span> <span class="o">...</span> <span class="c1">// streaming DataFrame of schema { timestamp: Timestamp, word: String }</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">words</span> <span class="o">=</span> <span class="o">...</span> <span class="c1">// streaming DataFrame of schema { timestamp: Timestamp, word: String }</span>
 
 <span class="c1">// Group the data by window and word and compute the count of each group</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">windowedCounts</span> <span class="o">=</span> <span class="n">words</span><span class="o">.</span><span class="na">groupBy</span><span class="o">(</span>
   <span class="n">functions</span><span class="o">.</span><span class="na">window</span><span class="o">(</span><span class="n">words</span><span class="o">.</span><span class="na">col</span><span class="o">(</span><span class="s">&quot;timestamp&quot;</span><span class="o">),</span> <span class="s">&quot;10 minutes&quot;</span><span class="o">,</span> <span class="s">&quot;5 minutes&quot;</span><span class="o">),</span>
   <span class="n">words</span><span class="o">.</span><span class="na">col</span><span class="o">(</span><span class="s">&quot;word&quot;</span><span class="o">)</span>
-<span class="o">).</span><span class="na">count</span><span class="o">();</span></code></pre></div>
+<span class="o">).</span><span class="na">count</span><span class="o">();</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">words</span> <span class="o">=</span> <span class="o">...</span>  <span class="c"># streaming DataFrame of schema { timestamp: Timestamp, word: String }</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">words</span> <span class="o">=</span> <span class="o">...</span>  <span class="c1"># streaming DataFrame of schema { timestamp: Timestamp, word: String }</span>
 
-<span class="c"># Group the data by window and word and compute the count of each group</span>
+<span class="c1"># Group the data by window and word and compute the count of each group</span>
 <span class="n">windowedCounts</span> <span class="o">=</span> <span class="n">words</span><span class="o">.</span><span class="n">groupBy</span><span class="p">(</span>
-    <span class="n">window</span><span class="p">(</span><span class="n">words</span><span class="o">.</span><span class="n">timestamp</span><span class="p">,</span> <span class="s">&quot;10 minutes&quot;</span><span class="p">,</span> <span class="s">&quot;5 minutes&quot;</span><span class="p">),</span>
+    <span class="n">window</span><span class="p">(</span><span class="n">words</span><span class="o">.</span><span class="n">timestamp</span><span class="p">,</span> <span class="s2">&quot;10 minutes&quot;</span><span class="p">,</span> <span class="s2">&quot;5 minutes&quot;</span><span class="p">),</span>
     <span class="n">words</span><span class="o">.</span><span class="n">word</span>
-<span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span></code></pre></div>
+<span class="p">)</span><span class="o">.</span><span class="n">count</span><span class="p">()</span></code></pre></figure>
 
   </div>
 </div>
 
+<h3 id="handling-late-data-and-watermarking">Handling Late Data and Watermarking</h3>
 <p>Now consider what happens if one of the events arrives late to the application.
-For example, a word that was generated at 12:04 but it was received at 12:11. 
-Since this windowing is based on the time in the data, the time 12:04 should be considered for windowing. This occurs naturally in our window-based grouping \u2013 the late data is automatically placed in the proper windows and the correct aggregates are updated as illustrated below.</p>
+For example, say, a word generated at 12:04 (i.e. event time) could be received received by 
+the application at 12:11. The application should use the time 12:04 instead of 12:11
+to update the older counts for the window <code>12:00 - 12:10</code>. This occurs 
+naturally in our window-based grouping \u2013 Structured Streaming can maintain the intermediate state 
+for partial aggregates for a long period of time such that late data can update aggregates of 
+old windows correctly, as illustrated below.</p>
 
 <p><img src="img/structured-streaming-late-data.png" alt="Handling Late Data" /></p>
 
+<p>However, to run this query for days, its necessary for the system to bound the amount of 
+intermediate in-memory state it accumulates. This means the system needs to know when an old 
+aggregate can be dropped from the in-memory state because the application is not going to receive 
+late data for that aggregate any more. To enable this, in Spark 2.1, we have introduced 
+<strong>watermarking</strong>, which let&#8217;s the engine automatically track the current event time in the data and
+and attempt to clean up old state accordingly. You can define the watermark of a query by 
+specifying the event time column and the threshold on how late the data is expected be in terms of 
+event time. For a specific window starting at time <code>T</code>, the engine will maintain state and allow late
+data to be update the state until <code>(max event time seen by the engine - late threshold &gt; T)</code>. 
+In other words, late data within the threshold will be aggregated, 
+but data later than the threshold will be dropped. Let&#8217;s understand this with an example. We can 
+easily define watermarking on the previous example using <code>withWatermark()</code> as shown below.</p>
+
+<div class="codetabs">
+<div data-lang="scala">
+
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">import</span> <span class="nn">spark.implicits._</span>
+
+<span class="k">val</span> <span class="n">words</span> <span class="k">=</span> <span class="o">...</span> <span class="c1">// streaming DataFrame of schema { timestamp: Timestamp, word: String }</span>
+
+<span class="c1">// Group the data by window and word and compute the count of each group</span>
+<span class="k">val</span> <span class="n">windowedCounts</span> <span class="k">=</span> <span class="n">words</span>
+    <span class="o">.</span><span class="n">withWatermark</span><span class="o">(</span><span class="s">&quot;timestamp&quot;</span><span class="o">,</span> <span class="s">&quot;10 minutes&quot;</span><span class="o">)</span>
+    <span class="o">.</span><span class="n">groupBy</span><span class="o">(</span>
+        <span class="n">window</span><span class="o">(</span><span class="n">$</span><span class="s">&quot;timestamp&quot;</span><span class="o">,</span> <span class="s">&quot;10 minutes&quot;</span><span class="o">,</span> <span class="s">&quot;5 minutes&quot;</span><span class="o">),</span>
+        <span class="n">$</span><span class="s">&quot;word&quot;</span><span class="o">)</span>
+    <span class="o">.</span><span class="n">count</span><span class="o">()</span></code></pre></figure>
+
+  </div>
+<div data-lang="java">
+
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">words</span> <span class="o">=</span> <span class="o">...</span> <span class="c1">// streaming DataFrame of schema { timestamp: Timestamp, word: String }</span>
+
+<span class="c1">// Group the data by window and word and compute the count of each group</span>
+<span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">windowedCounts</span> <span class="o">=</span> <span class="n">words</span>
+    <span class="o">.</span><span class="na">withWatermark</span><span class="o">(</span><span class="s">&quot;timestamp&quot;</span><span class="o">,</span> <span class="s">&quot;10 minutes&quot;</span><span class="o">)</span>
+    <span class="o">.</span><span class="na">groupBy</span><span class="o">(</span>
+        <span class="n">functions</span><span class="o">.</span><span class="na">window</span><span class="o">(</span><span class="n">words</span><span class="o">.</span><span class="na">col</span><span class="o">(</span><span class="s">&quot;timestamp&quot;</span><span class="o">),</span> <span class="s">&quot;10 minutes&quot;</span><span class="o">,</span> <span class="s">&quot;5 minutes&quot;</span><span class="o">),</span>
+        <span class="n">words</span><span class="o">.</span><span class="na">col</span><span class="o">(</span><span class="s">&quot;word&quot;</span><span class="o">))</span>
+    <span class="o">.</span><span class="na">count</span><span class="o">();</span></code></pre></figure>
+
+  </div>
+<div data-lang="python">
+
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">words</span> <span class="o">=</span> <span class="o">...</span>  <span class="c1"># streaming DataFrame of schema { timestamp: Timestamp, word: String }</span>
+
+<span class="c1"># Group the data by window and word and compute the count of each group</span>
+<span class="n">windowedCounts</span> <span class="o">=</span> <span class="n">words</span>
+    <span class="o">.</span><span class="n">withWatermark</span><span class="p">(</span><span class="s2">&quot;timestamp&quot;</span><span class="p">,</span> <span class="s2">&quot;10 minutes&quot;</span><span class="p">)</span>
+    <span class="o">.</span><span class="n">groupBy</span><span class="p">(</span>
+        <span class="n">window</span><span class="p">(</span><span class="n">words</span><span class="o">.</span><span class="n">timestamp</span><span class="p">,</span> <span class="s2">&quot;10 minutes&quot;</span><span class="p">,</span> <span class="s2">&quot;5 minutes&quot;</span><span class="p">),</span>
+        <span class="n">words</span><span class="o">.</span><span class="n">word</span><span class="p">)</span>
+    <span class="o">.</span><span class="n">count</span><span class="p">()</span></code></pre></figure>
+
+  </div>
+</div>
+
+<p>In this example, we are defining the watermark of the query on the value of the column &#8220;timestamp&#8221;, 
+and also defining &#8220;10 minutes&#8221; as the threshold of how late is the data allowed to be. If this query 
+is run in Append output mode (discussed later in <a href="#output-modes">Output Modes</a> section), 
+the engine will track the current event time from the column &#8220;timestamp&#8221; and wait for additional
+&#8220;10 minutes&#8221; in event time before finalizing the windowed counts and adding them to the Result Table.
+Here is an illustration. </p>
+
+<p><img src="img/structured-streaming-watermark.png" alt="Watermarking in Append Mode" /></p>
+
+<p>As shown in the illustration, the maximum event time tracked by the engine is the 
+<em>blue dashed line</em>, and the watermark set as <code>(max event time - '10 mins')</code>
+at the beginning of every trigger is the red line  For example, when the engine observes the data 
+<code>(12:14, dog)</code>, it sets the watermark for the next trigger as <code>12:04</code>.
+For the window <code>12:00 - 12:10</code>, the partial counts are maintained as internal state while the system
+is waiting for late data. After the system finds data (i.e. <code>(12:21, owl)</code>) such that the 
+watermark exceeds 12:10, the partial count is finalized and appended to the table. This count will
+not change any further as all &#8220;too-late&#8221; data older than 12:10 will be ignored.  </p>
+
+<p>Note that in Append output mode, the system has to wait for &#8220;late threshold&#8221; time 
+before it can output the aggregation of a window. This may not be ideal if data can be very late, 
+(say 1 day) and you like to have partial counts without waiting for a day. In future, we will add
+Update output mode which would allows every update to aggregates to be written to sink every trigger. </p>
+
+<p><strong>Conditions for watermarking to clean aggregation state</strong>
+It is important to note that the following conditions must be satisfied for the watermarking to 
+clean the state in aggregation queries <em>(as of Spark 2.1, subject to change in the future)</em>.</p>
+
+<ul>
+  <li>
+    <p><strong>Output mode must be Append.</strong> Complete mode requires all aggregate data to be preserved, and hence 
+cannot use watermarking to drop intermediate state. See the <a href="#output-modes">Output Modes</a> section 
+for detailed explanation of the semantics of each output mode.</p>
+  </li>
+  <li>
+    <p>The aggregation must have either the event-time column, or a <code>window</code> on the event-time column. </p>
+  </li>
+  <li>
+    <p><code>withWatermark</code> must be called on the 
+same column as the timestamp column used in the aggregate. For example, 
+<code>df.withWatermark("time", "1 min").groupBy("time2").count()</code> is invalid 
+in Append output mode, as watermark is defined on a different column
+as the aggregation column.</p>
+  </li>
+  <li>
+    <p><code>withWatermark</code> must be called before the aggregation for the watermark details to be used. 
+For example, <code>df.groupBy("time").count().withWatermark("time", "1 min")</code> is invalid in Append 
+output mode.</p>
+  </li>
+</ul>
+
 <h3 id="join-operations">Join Operations</h3>
 <p>Streaming DataFrames can be joined with static DataFrames to create new streaming DataFrames. Here are a few examples.</p>
 
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">staticDf</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span> <span class="o">...</span>
+    <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">staticDf</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span> <span class="o">...</span>
 <span class="k">val</span> <span class="n">streamingDf</span> <span class="k">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">readStream</span><span class="o">.</span> <span class="o">...</span> 
 
 <span class="n">streamingDf</span><span class="o">.</span><span class="n">join</span><span class="o">(</span><span class="n">staticDf</span><span class="o">,</span> <span class="s">&quot;type&quot;</span><span class="o">)</span>          <span class="c1">// inner equi-join with a static DF</span>
-<span class="n">streamingDf</span><span class="o">.</span><span class="n">join</span><span class="o">(</span><span class="n">staticDf</span><span class="o">,</span> <span class="s">&quot;type&quot;</span><span class="o">,</span> <span class="s">&quot;right_join&quot;</span><span class="o">)</span>  <span class="c1">// right outer join with a static DF</span></code></pre></div>
+<span class="n">streamingDf</span><span class="o">.</span><span class="n">join</span><span class="o">(</span><span class="n">staticDf</span><span class="o">,</span> <span class="s">&quot;type&quot;</span><span class="o">,</span> <span class="s">&quot;right_join&quot;</span><span class="o">)</span>  <span class="c1">// right outer join with a static DF  </span></code></pre></figure>
 
   </div>
 <div data-lang="java">
 
-    <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">staticDf</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">read</span><span class="o">.</span> <span class="o">...;</span>
+    <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">staticDf</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">read</span><span class="o">.</span> <span class="o">...;</span>
 <span class="n">Dataset</span><span class="o">&lt;</span><span class="n">Row</span><span class="o">&gt;</span> <span class="n">streamingDf</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="na">readStream</span><span class="o">.</span> <span class="o">...;</span>
 <span class="n">streamingDf</span><span class="o">.</span><span class="na">join</span><span class="o">(</span><span class="n">staticDf</span><span class="o">,</span> <span class="s">&quot;type&quot;</span><span class="o">);</span>         <span class="c1">// inner equi-join with a static DF</span>
-<span class="n">streamingDf</span><span class="o">.</span><span class="na">join</span><span class="o">(</span><span class="n">staticDf</span><span class="o">,</span> <span class="s">&quot;type&quot;</span><span class="o">,</span> <span class="s">&quot;right_join&quot;</span><span class="o">);</span>  <span class="c1">// right outer join with a static DF</span></code></pre></div>
+<span class="n">streamingDf</span><span class="o">.</span><span class="na">join</span><span class="o">(</span><span class="n">staticDf</span><span class="o">,</span> <span class="s">&quot;type&quot;</span><span class="o">,</span> <span class="s">&quot;right_join&quot;</span><span class="o">);</span>  <span class="c1">// right outer join with a static DF</span></code></pre></figure>
 
   </div>
 <div data-lang="python">
 
-    <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="n">staticDf</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span> <span class="o">...</span>
+    <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="n">staticDf</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">read</span><span class="o">.</span> <span class="o">...</span>
 <span class="n">streamingDf</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">readStream</span><span class="o">.</span> <span class="o">...</span>
-<span class="n">streamingDf</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">staticDf</span><span class="p">,</span> <span class="s">&quot;type&quot;</span><span class="p">)</span>  <span class="c"># inner equi-join with a static DF</span>
-<span class="n">streamingDf</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">staticDf</span><span class="p">,</span> <span class="s">&quot;type&quot;</span><span class="p">,</span> <span class="s">&quot;right_join&quot;</span><span class="p">)</span>  <span class="c"># right outer join with a static DF</span></code></pre></div>
+<span class="n">streamingDf</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">staticDf</span><span class="p">,</span> <span class="s2">&quot;type&quot;</span><span class="p">)</span>  <span class="c1"># inner equi-join with a static DF</span>
+<span class="n">streamingDf</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">staticDf</span><span class="p">,</span> <span class="s2">&quot;type&quot;</span><span class="p">,</span> <span class="s2">&quot;right_join&quot;</span><span class="p">)</span>  <span class="c1"># right outer join with a static DF</span></code></pre></figure>
 
   </div>
 </div>
@@ -878,7 +1002,7 @@ Since this windowing is based on the time in the data, the time 12:04 should be
 
 <ul>
   <li>
-    <p><code>count()</code> - Cannot return a single count from a streaming Dataset. Instead, use <code>ds.groupBy.count()</code> which returns a streaming Dataset containing a running count.</p>
+    <p><code>count()</code> - Cannot return a single count from a streaming Dataset. Instead, use <code>ds.groupBy.count()</code> which returns a streaming Dataset containing a running count. </p>
   </li>
   <li>
     <p><code>foreach()</code> - Instead use <code>ds.writeStream.foreach(...)</code> (see next section).</p>
@@ -897,7 +1021,7 @@ returned through <code>Dataset.writeStream()</code>. You will have to specify on
 
 <ul>
   <li>
-    <p><em>Details of the output sink:</em> Data format, location, etc.</p>
+    <p><em>Details of the output sink:</em> Data format, location, etc. </p>
   </li>
   <li>
     <p><em>Output mode:</em> Specify what gets written to the output sink.</p>
@@ -914,23 +1038,86 @@ returned through <code>Dataset.writeStream()</code>. You will have to specify on
 </ul>
 
 <h4 id="output-modes">Output Modes</h4>
-<p>There are two types of output mode currently implemented.</p>
+<p>There are a few types of output modes.</p>
 
 <ul>
   <li>
-    <p><strong>Append mode (default)</strong> - This is the default mode, where only the new rows added to the result table since the last trigger will be outputted to the sink. This is only applicable to queries that <em>do not have any aggregations</em> (e.g. queries with only <code>select</code>, <code>where</code>, <code>map</code>, <code>flatMap</code>, <code>filter</code>, <code>join</code>, etc.).</p>
+    <p><strong>Append mode (default)</strong> - This is the default mode, where only the 
+new rows added to the Result Table since the last trigger will be 
+outputted to the sink. This is supported for only those queries where 
+rows added to the Result Table is never going to change. Hence, this mode 
+guarantees that each row will be output only once (assuming 
+fault-tolerant sink). For example, queries with only <code>select</code>, 
+<code>where</code>, <code>map</code>, <code>flatMap</code>, <code>filter</code>, <code>join</code>, etc. will support Append mode.</p>
+  </li>
+  <li>
+    <p><strong>Complete mode</strong> - The whole Result Table will be outputted to the sink after every trigger.
+ This is supported for aggregation queries.</p>
   </li>
   <li>
-    <p><strong>Complete mode</strong> - The whole result table will be outputted to the sink.This is only applicable to queries that <em>have aggregations</em>.</p>
+    <p><strong>Update mode</strong> - (<em>not available in Spark 2.1</em>) Only the rows in the Result Table that were 
+updated since the last trigger will be outputted to the sink. 
+More information to be added in future releases.</p>
   </li>
 </ul>
 
+<p>Different types of streaming queries support different output modes. 
+Here is the compatibility matrix.</p>
+
+<table class="table">
+  <tr>
+    <th>Query Type</th>
+    <th></th>
+    <th>Supported Output Modes</th>
+    <th>Notes</th>        
+  </tr>
+  <tr>
+    <td colspan="2" valign="middle"><br />Queries without aggregation</td>
+    <td>Append</td>
+    <td>
+        Complete mode note supported as it is infeasible to keep all data in the Result Table.
+    </td>
+  </tr>
+  <tr>
+    <td rowspan="2">Queries with aggregation</td>
+    <td>Aggregation on event-time with watermark</td>
+    <td>Append, Complete</td>
+    <td>
+        Append mode uses watermark to drop old aggregation state. But the output of a 
+        windowed aggregation is delayed the late threshold specified in `withWatermark()` as by
+        the modes semantics, rows can be added to the Result Table only once after they are 
+        finalized (i.e. after watermark is crossed). See 
+        <a href="#handling-late-data">Late Data</a> section for more details.
+        <br /><br />
+        Complete mode does drop not old aggregation state since by definition this mode
+        preserves all data in the Result Table.
+    </td>    
+  </tr>
+  <tr>
+    <td>Other aggregations</td>
+    <td>Complete</td>
+    <td>
+        Append mode is not supported as aggregates can update thus violating the semantics of 
+        this mode.
+        <br /><br />
+        Complete mode does drop not old aggregation state since by definition this mode
+        preserves all data in the Result Table.
+    </td>  
+  </tr>
+  <tr>
+    <td></td>
+    <td></td>
+    <td></td>
+    <td></td>
+  </tr>
+</table>
+
 <h4 id="output-sinks">Output Sinks</h4>
 <p>There are a few types of built-in output sinks.</p>
 
 <ul>
   <li>
-    <p><strong>File sink</strong> - Stores the output to a directory. As of Spark 2.0, this only supports Parquet file format, and Append output mode.</p>
+    <p><strong>File sink</strong> - Stores the output to a directory. </p>
   </li>
   <li>
     <p><strong>Foreach sink</strong> - Runs arbitrary computation on the records in the output. See later in the section for more details.</p>
@@ -954,7 +1141,7 @@ returned through <code>Dataset.writeStream()</code>. You will have to specify on
     <th>Notes</th>
   </tr>
   <tr>
-    <td><b>File Sink</b><br />(only parquet in Spark 2.0)</td>
+    <td><b>File Sink</b></td>
     <td>Append</td>
     <td><pre>writeStream<br />  .format("parquet")<br />  .start()</pre></td>
     <td>Yes</td>
@@ -980,7 +1167,14 @@ returned through <code>Dataset.writeStream()</code>. You will have to specify on
     <td><pre>writeStream<br />  .format("memory")<br />  .queryName("table")<br />  .start()</pre></td>
     <td>No</td>
     <td>Saves the output data as a table, for interactive querying. Table name is the query name.</td>
-  </tr> 
+  </tr>
+  <tr>
+    <td></td>
+    <td></td>
+    <td></td>
+    <td></td>
+    <td></td>
+  </tr>
 </table>
 
 <p>Finally, you have to call <code>start()</code> to actually start the execution of the query. This returns a StreamingQuery object which is a handle to the continuously running execution. You can use this object to manage the query, which we will discuss in the next subsection. For now, let\u2019s understand all this with a few examples.</p>
@@ -988,7 +1182,7 @@ returned through <code>Dataset.writeStream()</code>. You will have to specify on
 <div class="codetabs">
 <div data-lang="scala">
 
-    <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// ========== DF with no ag

<TRUNCATED>

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org


[25/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

Posted by yh...@apache.org.
Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

This version is built from the docs source code generated by applying https://github.com/apache/spark/pull/16294 to v2.1.0 (so, other changes in branch 2.1 will not affect the doc).


Project: http://git-wip-us.apache.org/repos/asf/spark-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark-website/commit/d2bcf185
Tree: http://git-wip-us.apache.org/repos/asf/spark-website/tree/d2bcf185
Diff: http://git-wip-us.apache.org/repos/asf/spark-website/diff/d2bcf185

Branch: refs/heads/asf-site
Commit: d2bcf1854b0e0409495e2f1d3c6beaad923f6e6b
Parents: ecf94f2
Author: Yin Huai <yh...@databricks.com>
Authored: Wed Dec 28 14:32:43 2016 -0800
Committer: Yin Huai <yh...@databricks.com>
Committed: Wed Dec 28 14:32:43 2016 -0800

----------------------------------------------------------------------
 site/docs/2.1.0/building-spark.html             |  46 +-
 site/docs/2.1.0/building-with-maven.html        |  14 +-
 site/docs/2.1.0/configuration.html              |  52 +-
 site/docs/2.1.0/ec2-scripts.html                | 174 ++++
 site/docs/2.1.0/graphx-programming-guide.html   | 198 ++---
 site/docs/2.1.0/hadoop-provided.html            |  14 +-
 .../img/structured-streaming-watermark.png      | Bin 0 -> 252000 bytes
 site/docs/2.1.0/img/structured-streaming.pptx   | Bin 1105413 -> 1113902 bytes
 site/docs/2.1.0/job-scheduling.html             |  40 +-
 site/docs/2.1.0/ml-advanced.html                |  10 +-
 .../2.1.0/ml-classification-regression.html     | 838 +++++++++---------
 site/docs/2.1.0/ml-clustering.html              | 124 +--
 site/docs/2.1.0/ml-collaborative-filtering.html |  56 +-
 site/docs/2.1.0/ml-features.html                | 764 ++++++++--------
 site/docs/2.1.0/ml-migration-guides.html        |  16 +-
 site/docs/2.1.0/ml-pipeline.html                | 178 ++--
 site/docs/2.1.0/ml-tuning.html                  | 172 ++--
 site/docs/2.1.0/mllib-clustering.html           | 186 ++--
 .../2.1.0/mllib-collaborative-filtering.html    |  48 +-
 site/docs/2.1.0/mllib-data-types.html           | 208 ++---
 site/docs/2.1.0/mllib-decision-tree.html        |  94 +-
 .../2.1.0/mllib-dimensionality-reduction.html   |  28 +-
 site/docs/2.1.0/mllib-ensembles.html            | 182 ++--
 site/docs/2.1.0/mllib-evaluation-metrics.html   | 302 +++----
 site/docs/2.1.0/mllib-feature-extraction.html   | 122 +--
 .../2.1.0/mllib-frequent-pattern-mining.html    |  28 +-
 site/docs/2.1.0/mllib-isotonic-regression.html  |  38 +-
 site/docs/2.1.0/mllib-linear-methods.html       | 174 ++--
 site/docs/2.1.0/mllib-naive-bayes.html          |  24 +-
 site/docs/2.1.0/mllib-optimization.html         |  50 +-
 site/docs/2.1.0/mllib-pmml-model-export.html    |  35 +-
 site/docs/2.1.0/mllib-statistics.html           | 180 ++--
 site/docs/2.1.0/programming-guide.html          | 302 +++----
 site/docs/2.1.0/quick-start.html                | 166 ++--
 site/docs/2.1.0/running-on-mesos.html           |  52 +-
 site/docs/2.1.0/running-on-yarn.html            |  27 +-
 site/docs/2.1.0/spark-standalone.html           |  30 +-
 site/docs/2.1.0/sparkr.html                     | 145 ++--
 site/docs/2.1.0/sql-programming-guide.html      | 819 +++++++++---------
 site/docs/2.1.0/storage-openstack-swift.html    |  12 +-
 site/docs/2.1.0/streaming-custom-receivers.html |  26 +-
 .../2.1.0/streaming-kafka-0-10-integration.html |  52 +-
 .../docs/2.1.0/streaming-programming-guide.html | 416 ++++-----
 .../structured-streaming-kafka-integration.html |  44 +-
 .../structured-streaming-programming-guide.html | 864 ++++++++++++-------
 site/docs/2.1.0/submitting-applications.html    |  36 +-
 site/docs/2.1.0/tuning.html                     |  30 +-
 47 files changed, 3926 insertions(+), 3490 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/building-spark.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/building-spark.html b/site/docs/2.1.0/building-spark.html
index b3a720c..5c20245 100644
--- a/site/docs/2.1.0/building-spark.html
+++ b/site/docs/2.1.0/building-spark.html
@@ -127,33 +127,33 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#building-apache-spark" id="markdown-toc-building-apache-spark">Building Apache Spark</a>    <ul>
-      <li><a href="#apache-maven" id="markdown-toc-apache-maven">Apache Maven</a>        <ul>
-          <li><a href="#setting-up-mavens-memory-usage" id="markdown-toc-setting-up-mavens-memory-usage">Setting up Maven&#8217;s Memory Usage</a></li>
-          <li><a href="#buildmvn" id="markdown-toc-buildmvn">build/mvn</a></li>
+  <li><a href="#building-apache-spark">Building Apache Spark</a>    <ul>
+      <li><a href="#apache-maven">Apache Maven</a>        <ul>
+          <li><a href="#setting-up-mavens-memory-usage">Setting up Maven&#8217;s Memory Usage</a></li>
+          <li><a href="#buildmvn">build/mvn</a></li>
         </ul>
       </li>
-      <li><a href="#building-a-runnable-distribution" id="markdown-toc-building-a-runnable-distribution">Building a Runnable Distribution</a></li>
-      <li><a href="#specifying-the-hadoop-version" id="markdown-toc-specifying-the-hadoop-version">Specifying the Hadoop Version</a></li>
-      <li><a href="#building-with-hive-and-jdbc-support" id="markdown-toc-building-with-hive-and-jdbc-support">Building With Hive and JDBC Support</a></li>
-      <li><a href="#packaging-without-hadoop-dependencies-for-yarn" id="markdown-toc-packaging-without-hadoop-dependencies-for-yarn">Packaging without Hadoop Dependencies for YARN</a></li>
-      <li><a href="#building-with-mesos-support" id="markdown-toc-building-with-mesos-support">Building with Mesos support</a></li>
-      <li><a href="#building-for-scala-210" id="markdown-toc-building-for-scala-210">Building for Scala 2.10</a></li>
-      <li><a href="#building-submodules-individually" id="markdown-toc-building-submodules-individually">Building submodules individually</a></li>
-      <li><a href="#continuous-compilation" id="markdown-toc-continuous-compilation">Continuous Compilation</a></li>
-      <li><a href="#speeding-up-compilation-with-zinc" id="markdown-toc-speeding-up-compilation-with-zinc">Speeding up Compilation with Zinc</a></li>
-      <li><a href="#building-with-sbt" id="markdown-toc-building-with-sbt">Building with SBT</a></li>
-      <li><a href="#encrypted-filesystems" id="markdown-toc-encrypted-filesystems">�Encrypted Filesystems</a></li>
-      <li><a href="#intellij-idea-or-eclipse" id="markdown-toc-intellij-idea-or-eclipse">IntelliJ IDEA or Eclipse</a></li>
+      <li><a href="#building-a-runnable-distribution">Building a Runnable Distribution</a></li>
+      <li><a href="#specifying-the-hadoop-version">Specifying the Hadoop Version</a></li>
+      <li><a href="#building-with-hive-and-jdbc-support">Building With Hive and JDBC Support</a></li>
+      <li><a href="#packaging-without-hadoop-dependencies-for-yarn">Packaging without Hadoop Dependencies for YARN</a></li>
+      <li><a href="#building-with-mesos-support">Building with Mesos support</a></li>
+      <li><a href="#building-for-scala-210">Building for Scala 2.10</a></li>
+      <li><a href="#building-submodules-individually">Building submodules individually</a></li>
+      <li><a href="#continuous-compilation">Continuous Compilation</a></li>
+      <li><a href="#speeding-up-compilation-with-zinc">Speeding up Compilation with Zinc</a></li>
+      <li><a href="#building-with-sbt">Building with SBT</a></li>
+      <li><a href="#encrypted-filesystems">�Encrypted Filesystems</a></li>
+      <li><a href="#intellij-idea-or-eclipse">IntelliJ IDEA or Eclipse</a></li>
     </ul>
   </li>
-  <li><a href="#running-tests" id="markdown-toc-running-tests">Running Tests</a>    <ul>
-      <li><a href="#testing-with-sbt" id="markdown-toc-testing-with-sbt">Testing with SBT</a></li>
-      <li><a href="#running-java-8-test-suites" id="markdown-toc-running-java-8-test-suites">Running Java 8 Test Suites</a></li>
-      <li><a href="#pyspark-pip-installable" id="markdown-toc-pyspark-pip-installable">PySpark pip installable</a></li>
-      <li><a href="#pyspark-tests-with-maven" id="markdown-toc-pyspark-tests-with-maven">PySpark Tests with Maven</a></li>
-      <li><a href="#running-r-tests" id="markdown-toc-running-r-tests">Running R Tests</a></li>
-      <li><a href="#running-docker-based-integration-test-suites" id="markdown-toc-running-docker-based-integration-test-suites">Running Docker-based Integration Test Suites</a></li>
+  <li><a href="#running-tests">Running Tests</a>    <ul>
+      <li><a href="#testing-with-sbt">Testing with SBT</a></li>
+      <li><a href="#running-java-8-test-suites">Running Java 8 Test Suites</a></li>
+      <li><a href="#pyspark-pip-installable">PySpark pip installable</a></li>
+      <li><a href="#pyspark-tests-with-maven">PySpark Tests with Maven</a></li>
+      <li><a href="#running-r-tests">Running R Tests</a></li>
+      <li><a href="#running-docker-based-integration-test-suites">Running Docker-based Integration Test Suites</a></li>
     </ul>
   </li>
 </ul>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/building-with-maven.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/building-with-maven.html b/site/docs/2.1.0/building-with-maven.html
index dea0259..0aafc64 100644
--- a/site/docs/2.1.0/building-with-maven.html
+++ b/site/docs/2.1.0/building-with-maven.html
@@ -1,8 +1,10 @@
 <!DOCTYPE html>
-<meta charset=utf-8>
-<title>Redirecting...</title>
-<link rel=canonical href="/building-spark.html">
-<meta http-equiv=refresh content="0; url=/building-spark.html">
-<h1>Redirecting...</h1>
+<html lang="en-US">
+<meta charset="utf-8">
+<title>Redirecting\u2026</title>
+<link rel="canonical" href="/building-spark.html">
+<meta http-equiv="refresh" content="0; url=/building-spark.html">
+<h1>Redirecting\u2026</h1>
 <a href="/building-spark.html">Click here if you are not redirected.</a>
-<script>location='/building-spark.html'</script>
+<script>location="/building-spark.html"</script>
+</html>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/configuration.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/configuration.html b/site/docs/2.1.0/configuration.html
index 8336326..2adc859 100644
--- a/site/docs/2.1.0/configuration.html
+++ b/site/docs/2.1.0/configuration.html
@@ -127,19 +127,19 @@
                     
 
                     <ul id="markdown-toc">
-  <li><a href="#spark-properties" id="markdown-toc-spark-properties">Spark Properties</a>    <ul>
-      <li><a href="#dynamically-loading-spark-properties" id="markdown-toc-dynamically-loading-spark-properties">Dynamically Loading Spark Properties</a></li>
-      <li><a href="#viewing-spark-properties" id="markdown-toc-viewing-spark-properties">Viewing Spark Properties</a></li>
-      <li><a href="#available-properties" id="markdown-toc-available-properties">Available Properties</a>        <ul>
-          <li><a href="#application-properties" id="markdown-toc-application-properties">Application Properties</a></li>
-          <li><a href="#runtime-environment" id="markdown-toc-runtime-environment">Runtime Environment</a></li>
-          <li><a href="#shuffle-behavior" id="markdown-toc-shuffle-behavior">Shuffle Behavior</a></li>
-          <li><a href="#spark-ui" id="markdown-toc-spark-ui">Spark UI</a></li>
-          <li><a href="#compression-and-serialization" id="markdown-toc-compression-and-serialization">Compression and Serialization</a></li>
-          <li><a href="#memory-management" id="markdown-toc-memory-management">Memory Management</a></li>
-          <li><a href="#execution-behavior" id="markdown-toc-execution-behavior">Execution Behavior</a></li>
-          <li><a href="#networking" id="markdown-toc-networking">Networking</a></li>
-          <li><a href="#scheduling" id="markdown-toc-scheduling">Scheduling</a></li>
+  <li><a href="#spark-properties">Spark Properties</a>    <ul>
+      <li><a href="#dynamically-loading-spark-properties">Dynamically Loading Spark Properties</a></li>
+      <li><a href="#viewing-spark-properties">Viewing Spark Properties</a></li>
+      <li><a href="#available-properties">Available Properties</a>        <ul>
+          <li><a href="#application-properties">Application Properties</a></li>
+          <li><a href="#runtime-environment">Runtime Environment</a></li>
+          <li><a href="#shuffle-behavior">Shuffle Behavior</a></li>
+          <li><a href="#spark-ui">Spark UI</a></li>
+          <li><a href="#compression-and-serialization">Compression and Serialization</a></li>
+          <li><a href="#memory-management">Memory Management</a></li>
+          <li><a href="#execution-behavior">Execution Behavior</a></li>
+          <li><a href="#networking">Networking</a></li>
+          <li><a href="#scheduling">Scheduling</a></li>
         </ul>
       </li>
     </ul>
@@ -169,10 +169,10 @@ application. These properties can be set directly on a
 <p>Note that we run with local[2], meaning two threads - which represents &#8220;minimal&#8221; parallelism,
 which can help detect bugs that only exist when we run in a distributed context.</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">()</span>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">()</span>
              <span class="o">.</span><span class="n">setMaster</span><span class="o">(</span><span class="s">&quot;local[2]&quot;</span><span class="o">)</span>
              <span class="o">.</span><span class="n">setAppName</span><span class="o">(</span><span class="s">&quot;CountingSheep&quot;</span><span class="o">)</span>
-<span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></div>
+<span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></figure>
 
 <p>Note that we can have more than 1 thread in local mode, and in cases like Spark Streaming, we may
 actually require more than 1 thread to prevent any sort of starvation issues.</p>
@@ -204,12 +204,12 @@ The following format is accepted:</p>
 instance, if you&#8217;d like to run the same application with different masters or different
 amounts of memory. Spark allows you to simply create an empty conf:</p>
 
-<div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="k">new</span> <span class="nc">SparkConf</span><span class="o">())</span></code></pre></div>
+<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="k">new</span> <span class="nc">SparkConf</span><span class="o">())</span></code></pre></figure>
 
 <p>Then, you can supply configuration values at runtime:</p>
 
-<div class="highlight"><pre><code class="language-bash" data-lang="bash">./bin/spark-submit --name <span class="s2">&quot;My app&quot;</span> --master <span class="nb">local</span><span class="o">[</span>4<span class="o">]</span> --conf spark.eventLog.enabled<span class="o">=</span><span class="nb">false</span>
-  --conf <span class="s2">&quot;spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps&quot;</span> myApp.jar</code></pre></div>
+<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>./bin/spark-submit --name <span class="s2">&quot;My app&quot;</span> --master local<span class="o">[</span><span class="m">4</span><span class="o">]</span> --conf spark.eventLog.enabled<span class="o">=</span><span class="nb">false</span>
+  --conf <span class="s2">&quot;spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps&quot;</span> myApp.jar</code></pre></figure>
 
 <p>The Spark shell and <a href="submitting-applications.html"><code>spark-submit</code></a>
 tool support two ways to load configurations dynamically. The first are command line options,
@@ -1856,30 +1856,30 @@ Running the <code>SET -v</code> command will show the entire list of the SQL con
 <div class="codetabs">
 <div data-lang="scala">
 
-          <div class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="c1">// spark is an existing SparkSession</span>
-<span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="o">(</span><span class="s">&quot;SET -v&quot;</span><span class="o">).</span><span class="n">show</span><span class="o">(</span><span class="n">numRows</span> <span class="k">=</span> <span class="mi">200</span><span class="o">,</span> <span class="n">truncate</span> <span class="k">=</span> <span class="kc">false</span><span class="o">)</span></code></pre></div>
+          <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// spark is an existing SparkSession</span>
+<span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="o">(</span><span class="s">&quot;SET -v&quot;</span><span class="o">).</span><span class="n">show</span><span class="o">(</span><span class="n">numRows</span> <span class="k">=</span> <span class="mi">200</span><span class="o">,</span> <span class="n">truncate</span> <span class="k">=</span> <span class="kc">false</span><span class="o">)</span></code></pre></figure>
 
         </div>
 
 <div data-lang="java">
 
-          <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="c1">// spark is an existing SparkSession</span>
-<span class="n">spark</span><span class="o">.</span><span class="na">sql</span><span class="o">(</span><span class="s">&quot;SET -v&quot;</span><span class="o">).</span><span class="na">show</span><span class="o">(</span><span class="mi">200</span><span class="o">,</span> <span class="kc">false</span><span class="o">);</span></code></pre></div>
+          <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// spark is an existing SparkSession</span>
+<span class="n">spark</span><span class="o">.</span><span class="na">sql</span><span class="o">(</span><span class="s">&quot;SET -v&quot;</span><span class="o">).</span><span class="na">show</span><span class="o">(</span><span class="mi">200</span><span class="o">,</span> <span class="kc">false</span><span class="o">);</span></code></pre></figure>
 
         </div>
 
 <div data-lang="python">
 
-          <div class="highlight"><pre><code class="language-python" data-lang="python"><span class="c"># spark is an existing SparkSession</span>
-<span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s">&quot;SET -v&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span> <span class="n">truncate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span></code></pre></div>
+          <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="c1"># spark is an existing SparkSession</span>
+<span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">&quot;SET -v&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span> <span class="n">truncate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span></code></pre></figure>
 
         </div>
 
 <div data-lang="r">
 
-          <div class="highlight"><pre><code class="language-r" data-lang="r">sparkR.session<span class="p">()</span>
+          <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span>sparkR.session<span class="p">()</span>
 properties <span class="o">&lt;-</span> sql<span class="p">(</span><span class="s">&quot;SET -v&quot;</span><span class="p">)</span>
-showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span class="o">=</span> <span class="m">200</span><span class="p">,</span> truncate <span class="o">=</span> <span class="kc">FALSE</span><span class="p">)</span></code></pre></div>
+showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span class="o">=</span> <span class="m">200</span><span class="p">,</span> truncate <span class="o">=</span> <span class="kc">FALSE</span><span class="p">)</span></code></pre></figure>
 
         </div>
 </div>

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/ec2-scripts.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/ec2-scripts.html b/site/docs/2.1.0/ec2-scripts.html
new file mode 100644
index 0000000..eb1bb60
--- /dev/null
+++ b/site/docs/2.1.0/ec2-scripts.html
@@ -0,0 +1,174 @@
+
+<!DOCTYPE html>
+<!--[if lt IE 7]>      <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
+<!--[if IE 7]>         <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
+<!--[if IE 8]>         <html class="no-js lt-ie9"> <![endif]-->
+<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
+    <head>
+        <meta charset="utf-8">
+        <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
+        <title>Running Spark on EC2 - Spark 2.1.0 Documentation</title>
+        
+
+        
+          <meta http-equiv="refresh" content="0; url=https://github.com/amplab/spark-ec2#readme">
+          <link rel="canonical" href="https://github.com/amplab/spark-ec2#readme" />
+        
+
+        <link rel="stylesheet" href="css/bootstrap.min.css">
+        <style>
+            body {
+                padding-top: 60px;
+                padding-bottom: 40px;
+            }
+        </style>
+        <meta name="viewport" content="width=device-width">
+        <link rel="stylesheet" href="css/bootstrap-responsive.min.css">
+        <link rel="stylesheet" href="css/main.css">
+
+        <script src="js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script>
+
+        <link rel="stylesheet" href="css/pygments-default.css">
+
+        
+        <!-- Google analytics script -->
+        <script type="text/javascript">
+          var _gaq = _gaq || [];
+          _gaq.push(['_setAccount', 'UA-32518208-2']);
+          _gaq.push(['_trackPageview']);
+
+          (function() {
+            var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
+            ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
+            var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
+          })();
+        </script>
+        
+
+    </head>
+    <body>
+        <!--[if lt IE 7]>
+            <p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
+        <![endif]-->
+
+        <!-- This code is taken from http://twitter.github.com/bootstrap/examples/hero.html -->
+
+        <div class="navbar navbar-fixed-top" id="topbar">
+            <div class="navbar-inner">
+                <div class="container">
+                    <div class="brand"><a href="index.html">
+                      <img src="img/spark-logo-hd.png" style="height:50px;"/></a><span class="version">2.1.0</span>
+                    </div>
+                    <ul class="nav">
+                        <!--TODO(andyk): Add class="active" attribute to li some how.-->
+                        <li><a href="index.html">Overview</a></li>
+
+                        <li class="dropdown">
+                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">Programming Guides<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="quick-start.html">Quick Start</a></li>
+                                <li><a href="programming-guide.html">Spark Programming Guide</a></li>
+                                <li class="divider"></li>
+                                <li><a href="streaming-programming-guide.html">Spark Streaming</a></li>
+                                <li><a href="sql-programming-guide.html">DataFrames, Datasets and SQL</a></li>
+                                <li><a href="structured-streaming-programming-guide.html">Structured Streaming</a></li>
+                                <li><a href="ml-guide.html">MLlib (Machine Learning)</a></li>
+                                <li><a href="graphx-programming-guide.html">GraphX (Graph Processing)</a></li>
+                                <li><a href="sparkr.html">SparkR (R on Spark)</a></li>
+                            </ul>
+                        </li>
+
+                        <li class="dropdown">
+                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">API Docs<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="api/scala/index.html#org.apache.spark.package">Scala</a></li>
+                                <li><a href="api/java/index.html">Java</a></li>
+                                <li><a href="api/python/index.html">Python</a></li>
+                                <li><a href="api/R/index.html">R</a></li>
+                            </ul>
+                        </li>
+
+                        <li class="dropdown">
+                            <a href="#" class="dropdown-toggle" data-toggle="dropdown">Deploying<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="cluster-overview.html">Overview</a></li>
+                                <li><a href="submitting-applications.html">Submitting Applications</a></li>
+                                <li class="divider"></li>
+                                <li><a href="spark-standalone.html">Spark Standalone</a></li>
+                                <li><a href="running-on-mesos.html">Mesos</a></li>
+                                <li><a href="running-on-yarn.html">YARN</a></li>
+                            </ul>
+                        </li>
+
+                        <li class="dropdown">
+                            <a href="api.html" class="dropdown-toggle" data-toggle="dropdown">More<b class="caret"></b></a>
+                            <ul class="dropdown-menu">
+                                <li><a href="configuration.html">Configuration</a></li>
+                                <li><a href="monitoring.html">Monitoring</a></li>
+                                <li><a href="tuning.html">Tuning Guide</a></li>
+                                <li><a href="job-scheduling.html">Job Scheduling</a></li>
+                                <li><a href="security.html">Security</a></li>
+                                <li><a href="hardware-provisioning.html">Hardware Provisioning</a></li>
+                                <li class="divider"></li>
+                                <li><a href="building-spark.html">Building Spark</a></li>
+                                <li><a href="http://spark.apache.org/contributing.html">Contributing to Spark</a></li>
+                                <li><a href="http://spark.apache.org/third-party-projects.html">Third Party Projects</a></li>
+                            </ul>
+                        </li>
+                    </ul>
+                    <!--<p class="navbar-text pull-right"><span class="version-text">v2.1.0</span></p>-->
+                </div>
+            </div>
+        </div>
+
+        <div class="container-wrapper">
+
+            
+                <div class="content" id="content">
+                    
+                        <h1 class="title">Running Spark on EC2</h1>
+                    
+
+                    <p>This document has been superseded and replaced by documentation at https://github.com/amplab/spark-ec2#readme</p>
+
+
+                </div>
+            
+             <!-- /container -->
+        </div>
+
+        <script src="js/vendor/jquery-1.8.0.min.js"></script>
+        <script src="js/vendor/bootstrap.min.js"></script>
+        <script src="js/vendor/anchor.min.js"></script>
+        <script src="js/main.js"></script>
+
+        <!-- MathJax Section -->
+        <script type="text/x-mathjax-config">
+            MathJax.Hub.Config({
+                TeX: { equationNumbers: { autoNumber: "AMS" } }
+            });
+        </script>
+        <script>
+            // Note that we load MathJax this way to work with local file (file://), HTTP and HTTPS.
+            // We could use "//cdn.mathjax...", but that won't support "file://".
+            (function(d, script) {
+                script = d.createElement('script');
+                script.type = 'text/javascript';
+                script.async = true;
+                script.onload = function(){
+                    MathJax.Hub.Config({
+                        tex2jax: {
+                            inlineMath: [ ["$", "$"], ["\\\\(","\\\\)"] ],
+                            displayMath: [ ["$$","$$"], ["\\[", "\\]"] ],
+                            processEscapes: true,
+                            skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
+                        }
+                    });
+                };
+                script.src = ('https:' == document.location.protocol ? 'https://' : 'http://') +
+                    'cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML';
+                d.getElementsByTagName('head')[0].appendChild(script);
+            }(document));
+        </script>
+    </body>
+</html>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org