You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by yh...@apache.org on 2016/12/28 22:35:28 UTC

[17/25] spark-website git commit: Update 2.1.0 docs to include https://github.com/apache/spark/pull/16294

http://git-wip-us.apache.org/repos/asf/spark-website/blob/d2bcf185/site/docs/2.1.0/mllib-clustering.html
----------------------------------------------------------------------
diff --git a/site/docs/2.1.0/mllib-clustering.html b/site/docs/2.1.0/mllib-clustering.html
index 9667606..1b50dab 100644
--- a/site/docs/2.1.0/mllib-clustering.html
+++ b/site/docs/2.1.0/mllib-clustering.html
@@ -366,12 +366,12 @@ models are trained for each cluster).</p>
 <p>The <code>spark.mllib</code> package supports the following models:</p>
 
 <ul id="markdown-toc">
-  <li><a href="#k-means" id="markdown-toc-k-means">K-means</a></li>
-  <li><a href="#gaussian-mixture" id="markdown-toc-gaussian-mixture">Gaussian mixture</a></li>
-  <li><a href="#power-iteration-clustering-pic" id="markdown-toc-power-iteration-clustering-pic">Power iteration clustering (PIC)</a></li>
-  <li><a href="#latent-dirichlet-allocation-lda" id="markdown-toc-latent-dirichlet-allocation-lda">Latent Dirichlet allocation (LDA)</a></li>
-  <li><a href="#bisecting-k-means" id="markdown-toc-bisecting-k-means">Bisecting k-means</a></li>
-  <li><a href="#streaming-k-means" id="markdown-toc-streaming-k-means">Streaming k-means</a></li>
+  <li><a href="#k-means">K-means</a></li>
+  <li><a href="#gaussian-mixture">Gaussian mixture</a></li>
+  <li><a href="#power-iteration-clustering-pic">Power iteration clustering (PIC)</a></li>
+  <li><a href="#latent-dirichlet-allocation-lda">Latent Dirichlet allocation (LDA)</a></li>
+  <li><a href="#bisecting-k-means">Bisecting k-means</a></li>
+  <li><a href="#streaming-k-means">Streaming k-means</a></li>
 </ul>
 
 <h2 id="k-means">K-means</h2>
@@ -408,7 +408,7 @@ optimal <em>k</em> is usually one where there is an &#8220;elbow&#8221; in the W
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.clustering.KMeans"><code>KMeans</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.clustering.KMeansModel"><code>KMeansModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.</span><span class="o">{</span><span class="nc">KMeans</span><span class="o">,</span> <span class="nc">KMeansModel</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.</span><span class="o">{</span><span class="nc">KMeans</span><span class="o">,</span> <span class="nc">KMeansModel</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 
 <span class="c1">// Load and parse the data</span>
@@ -440,7 +440,7 @@ that is equivalent to the provided example in Scala is given below:</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/clustering/KMeans.html"><code>KMeans</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/clustering/KMeansModel.html"><code>KMeansModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.KMeans</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.KMeansModel</span><span class="o">;</span>
@@ -470,7 +470,7 @@ that is equivalent to the provided example in Scala is given below:</p>
 <span class="n">KMeansModel</span> <span class="n">clusters</span> <span class="o">=</span> <span class="n">KMeans</span><span class="o">.</span><span class="na">train</span><span class="o">(</span><span class="n">parsedData</span><span class="o">.</span><span class="na">rdd</span><span class="o">(),</span> <span class="n">numClusters</span><span class="o">,</span> <span class="n">numIterations</span><span class="o">);</span>
 
 <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;Cluster centers:&quot;</span><span class="o">);</span>
-<span class="k">for</span> <span class="o">(</span><span class="n">Vector</span> <span class="nl">center:</span> <span class="n">clusters</span><span class="o">.</span><span class="na">clusterCenters</span><span class="o">())</span> <span class="o">{</span>
+<span class="k">for</span> <span class="o">(</span><span class="n">Vector</span> <span class="n">center</span><span class="o">:</span> <span class="n">clusters</span><span class="o">.</span><span class="na">clusterCenters</span><span class="o">())</span> <span class="o">{</span>
   <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot; &quot;</span> <span class="o">+</span> <span class="n">center</span><span class="o">);</span>
 <span class="o">}</span>
 <span class="kt">double</span> <span class="n">cost</span> <span class="o">=</span> <span class="n">clusters</span><span class="o">.</span><span class="na">computeCost</span><span class="o">(</span><span class="n">parsedData</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
@@ -498,29 +498,29 @@ fact the optimal <em>k</em> is usually one where there is an &#8220;elbow&#8221;
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.KMeans"><code>KMeans</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.KMeansModel"><code>KMeansModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">numpy</span> <span class="kn">import</span> <span class="n">array</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">numpy</span> <span class="kn">import</span> <span class="n">array</span>
 <span class="kn">from</span> <span class="nn">math</span> <span class="kn">import</span> <span class="n">sqrt</span>
 
 <span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">KMeans</span><span class="p">,</span> <span class="n">KMeansModel</span>
 
-<span class="c"># Load and parse the data</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span>
-<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">array</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)]))</span>
+<span class="c1"># Load and parse the data</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span>
+<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">array</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)]))</span>
 
-<span class="c"># Build the model (cluster the data)</span>
-<span class="n">clusters</span> <span class="o">=</span> <span class="n">KMeans</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">parsedData</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">maxIterations</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">initializationMode</span><span class="o">=</span><span class="s">&quot;random&quot;</span><span class="p">)</span>
+<span class="c1"># Build the model (cluster the data)</span>
+<span class="n">clusters</span> <span class="o">=</span> <span class="n">KMeans</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">parsedData</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">maxIterations</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">initializationMode</span><span class="o">=</span><span class="s2">&quot;random&quot;</span><span class="p">)</span>
 
-<span class="c"># Evaluate clustering by computing Within Set Sum of Squared Errors</span>
+<span class="c1"># Evaluate clustering by computing Within Set Sum of Squared Errors</span>
 <span class="k">def</span> <span class="nf">error</span><span class="p">(</span><span class="n">point</span><span class="p">):</span>
     <span class="n">center</span> <span class="o">=</span> <span class="n">clusters</span><span class="o">.</span><span class="n">centers</span><span class="p">[</span><span class="n">clusters</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">point</span><span class="p">)]</span>
     <span class="k">return</span> <span class="n">sqrt</span><span class="p">(</span><span class="nb">sum</span><span class="p">([</span><span class="n">x</span><span class="o">**</span><span class="mi">2</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="p">(</span><span class="n">point</span> <span class="o">-</span> <span class="n">center</span><span class="p">)]))</span>
 
 <span class="n">WSSSE</span> <span class="o">=</span> <span class="n">parsedData</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">point</span><span class="p">:</span> <span class="n">error</span><span class="p">(</span><span class="n">point</span><span class="p">))</span><span class="o">.</span><span class="n">reduce</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Within Set Sum of Squared Error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">WSSSE</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Within Set Sum of Squared Error = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">WSSSE</span><span class="p">))</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">clusters</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonKMeansExample/KMeansModel&quot;</span><span class="p">)</span>
-<span class="n">sameModel</span> <span class="o">=</span> <span class="n">KMeansModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonKMeansExample/KMeansModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">clusters</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonKMeansExample/KMeansModel&quot;</span><span class="p">)</span>
+<span class="n">sameModel</span> <span class="o">=</span> <span class="n">KMeansModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonKMeansExample/KMeansModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/k_means_example.py" in the Spark repo.</small></div>
   </div>
@@ -554,7 +554,7 @@ to the algorithm. We then output the parameters of the mixture model.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.clustering.GaussianMixture"><code>GaussianMixture</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.clustering.GaussianMixtureModel"><code>GaussianMixtureModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.</span><span class="o">{</span><span class="nc">GaussianMixture</span><span class="o">,</span> <span class="nc">GaussianMixtureModel</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.</span><span class="o">{</span><span class="nc">GaussianMixture</span><span class="o">,</span> <span class="nc">GaussianMixtureModel</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 
 <span class="c1">// Load and parse the data</span>
@@ -587,7 +587,7 @@ that is equivalent to the provided example in Scala is given below:</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/clustering/GaussianMixture.html"><code>GaussianMixture</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/clustering/GaussianMixtureModel.html"><code>GaussianMixtureModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.function.Function</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.GaussianMixture</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.GaussianMixtureModel</span><span class="o">;</span>
@@ -612,7 +612,7 @@ that is equivalent to the provided example in Scala is given below:</p>
 <span class="n">parsedData</span><span class="o">.</span><span class="na">cache</span><span class="o">();</span>
 
 <span class="c1">// Cluster the data into two classes using GaussianMixture</span>
-<span class="n">GaussianMixtureModel</span> <span class="n">gmm</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">GaussianMixture</span><span class="o">().</span><span class="na">setK</span><span class="o">(</span><span class="mi">2</span><span class="o">).</span><span class="na">run</span><span class="o">(</span><span class="n">parsedData</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
+<span class="n">GaussianMixtureModel</span> <span class="n">gmm</span> <span class="o">=</span> <span class="k">new</span> <span class="n">GaussianMixture</span><span class="o">().</span><span class="na">setK</span><span class="o">(</span><span class="mi">2</span><span class="o">).</span><span class="na">run</span><span class="o">(</span><span class="n">parsedData</span><span class="o">.</span><span class="na">rdd</span><span class="o">());</span>
 
 <span class="c1">// Save and load GaussianMixtureModel</span>
 <span class="n">gmm</span><span class="o">.</span><span class="na">save</span><span class="o">(</span><span class="n">jsc</span><span class="o">.</span><span class="na">sc</span><span class="o">(),</span> <span class="s">&quot;target/org/apache/spark/JavaGaussianMixtureExample/GaussianMixtureModel&quot;</span><span class="o">);</span>
@@ -636,26 +636,26 @@ to the algorithm. We then output the parameters of the mixture model.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.GaussianMixture"><code>GaussianMixture</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.GaussianMixtureModel"><code>GaussianMixtureModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">numpy</span> <span class="kn">import</span> <span class="n">array</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">numpy</span> <span class="kn">import</span> <span class="n">array</span>
 
 <span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">GaussianMixture</span><span class="p">,</span> <span class="n">GaussianMixtureModel</span>
 
-<span class="c"># Load and parse the data</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/gmm_data.txt&quot;</span><span class="p">)</span>
-<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">array</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)]))</span>
+<span class="c1"># Load and parse the data</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/gmm_data.txt&quot;</span><span class="p">)</span>
+<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">array</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)]))</span>
 
-<span class="c"># Build the model (cluster the data)</span>
+<span class="c1"># Build the model (cluster the data)</span>
 <span class="n">gmm</span> <span class="o">=</span> <span class="n">GaussianMixture</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">parsedData</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">gmm</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonGaussianMixtureExample/GaussianMixtureModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">gmm</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonGaussianMixtureExample/GaussianMixtureModel&quot;</span><span class="p">)</span>
 <span class="n">sameModel</span> <span class="o">=</span> <span class="n">GaussianMixtureModel</span>\
-    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonGaussianMixtureExample/GaussianMixtureModel&quot;</span><span class="p">)</span>
+    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonGaussianMixtureExample/GaussianMixtureModel&quot;</span><span class="p">)</span>
 
-<span class="c"># output parameters of model</span>
+<span class="c1"># output parameters of model</span>
 <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">):</span>
-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;weight = &quot;</span><span class="p">,</span> <span class="n">gmm</span><span class="o">.</span><span class="n">weights</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="s">&quot;mu = &quot;</span><span class="p">,</span> <span class="n">gmm</span><span class="o">.</span><span class="n">gaussians</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">mu</span><span class="p">,</span>
-          <span class="s">&quot;sigma = &quot;</span><span class="p">,</span> <span class="n">gmm</span><span class="o">.</span><span class="n">gaussians</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">sigma</span><span class="o">.</span><span class="n">toArray</span><span class="p">())</span>
+    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;weight = &quot;</span><span class="p">,</span> <span class="n">gmm</span><span class="o">.</span><span class="n">weights</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="s2">&quot;mu = &quot;</span><span class="p">,</span> <span class="n">gmm</span><span class="o">.</span><span class="n">gaussians</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">mu</span><span class="p">,</span>
+          <span class="s2">&quot;sigma = &quot;</span><span class="p">,</span> <span class="n">gmm</span><span class="o">.</span><span class="n">gaussians</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">sigma</span><span class="o">.</span><span class="n">toArray</span><span class="p">())</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/gaussian_mixture_example.py" in the Spark repo.</small></div>
   </div>
@@ -701,7 +701,7 @@ which contains the computed clustering assignments.</p>
 
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClustering"><code>PowerIterationClustering</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.clustering.PowerIterationClusteringModel"><code>PowerIterationClusteringModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.PowerIterationClustering</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.PowerIterationClustering</span>
 
 <span class="k">val</span> <span class="n">circlesRdd</span> <span class="k">=</span> <span class="n">generateCirclesRdd</span><span class="o">(</span><span class="n">sc</span><span class="o">,</span> <span class="n">params</span><span class="o">.</span><span class="n">k</span><span class="o">,</span> <span class="n">params</span><span class="o">.</span><span class="n">numPoints</span><span class="o">)</span>
 <span class="k">val</span> <span class="n">model</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">PowerIterationClustering</span><span class="o">()</span>
@@ -714,12 +714,12 @@ which contains the computed clustering assignments.</p>
 <span class="k">val</span> <span class="n">assignments</span> <span class="k">=</span> <span class="n">clusters</span><span class="o">.</span><span class="n">toList</span><span class="o">.</span><span class="n">sortBy</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">k</span><span class="o">,</span> <span class="n">v</span><span class="o">)</span> <span class="k">=&gt;</span> <span class="n">v</span><span class="o">.</span><span class="n">length</span> <span class="o">}</span>
 <span class="k">val</span> <span class="n">assignmentsStr</span> <span class="k">=</span> <span class="n">assignments</span>
   <span class="o">.</span><span class="n">map</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">k</span><span class="o">,</span> <span class="n">v</span><span class="o">)</span> <span class="k">=&gt;</span>
-    <span class="n">s</span><span class="s">&quot;$k -&gt; ${v.sorted.mkString(&quot;</span><span class="o">[</span><span class="err">&quot;</span>, <span class="err">&quot;</span>,<span class="err">&quot;</span>, <span class="err">&quot;</span><span class="o">]</span><span class="s">&quot;)}&quot;</span>
+    <span class="s">s&quot;</span><span class="si">$k</span><span class="s"> -&gt; </span><span class="si">${</span><span class="n">v</span><span class="o">.</span><span class="n">sorted</span><span class="o">.</span><span class="n">mkString</span><span class="o">(</span><span class="s">&quot;[&quot;</span><span class="o">,</span> <span class="s">&quot;,&quot;</span><span class="o">,</span> <span class="s">&quot;]&quot;</span><span class="o">)</span><span class="si">}</span><span class="s">&quot;</span>
   <span class="o">}.</span><span class="n">mkString</span><span class="o">(</span><span class="s">&quot;, &quot;</span><span class="o">)</span>
 <span class="k">val</span> <span class="n">sizesStr</span> <span class="k">=</span> <span class="n">assignments</span><span class="o">.</span><span class="n">map</span> <span class="o">{</span>
   <span class="k">_</span><span class="o">.</span><span class="n">_2</span><span class="o">.</span><span class="n">length</span>
 <span class="o">}.</span><span class="n">sorted</span><span class="o">.</span><span class="n">mkString</span><span class="o">(</span><span class="s">&quot;(&quot;</span><span class="o">,</span> <span class="s">&quot;,&quot;</span><span class="o">,</span> <span class="s">&quot;)&quot;</span><span class="o">)</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Cluster assignments: $assignmentsStr\ncluster sizes: $sizesStr&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Cluster assignments: </span><span class="si">$assignmentsStr</span><span class="s">\ncluster sizes: </span><span class="si">$sizesStr</span><span class="s">&quot;</span><span class="o">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala" in the Spark repo.</small></div>
   </div>
@@ -736,22 +736,22 @@ which contains the computed clustering assignments.</p>
 
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/clustering/PowerIterationClustering.html"><code>PowerIterationClustering</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/clustering/PowerIterationClusteringModel.html"><code>PowerIterationClusteringModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.PowerIterationClustering</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.PowerIterationClustering</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.PowerIterationClusteringModel</span><span class="o">;</span>
 
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Tuple3</span><span class="o">&lt;</span><span class="n">Long</span><span class="o">,</span> <span class="n">Long</span><span class="o">,</span> <span class="n">Double</span><span class="o">&gt;&gt;</span> <span class="n">similarities</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">parallelize</span><span class="o">(</span><span class="n">Lists</span><span class="o">.</span><span class="na">newArrayList</span><span class="o">(</span>
-  <span class="k">new</span> <span class="n">Tuple3</span><span class="o">&lt;&gt;(</span><span class="mi">0L</span><span class="o">,</span> <span class="mi">1L</span><span class="o">,</span> <span class="mf">0.9</span><span class="o">),</span>
+  <span class="k">new</span> <span class="n">Tuple3</span><span class="o">&lt;&gt;(</span><span class="mi">0</span><span class="n">L</span><span class="o">,</span> <span class="mi">1L</span><span class="o">,</span> <span class="mf">0.9</span><span class="o">),</span>
   <span class="k">new</span> <span class="n">Tuple3</span><span class="o">&lt;&gt;(</span><span class="mi">1L</span><span class="o">,</span> <span class="mi">2L</span><span class="o">,</span> <span class="mf">0.9</span><span class="o">),</span>
   <span class="k">new</span> <span class="n">Tuple3</span><span class="o">&lt;&gt;(</span><span class="mi">2L</span><span class="o">,</span> <span class="mi">3L</span><span class="o">,</span> <span class="mf">0.9</span><span class="o">),</span>
   <span class="k">new</span> <span class="n">Tuple3</span><span class="o">&lt;&gt;(</span><span class="mi">3L</span><span class="o">,</span> <span class="mi">4L</span><span class="o">,</span> <span class="mf">0.1</span><span class="o">),</span>
   <span class="k">new</span> <span class="n">Tuple3</span><span class="o">&lt;&gt;(</span><span class="mi">4L</span><span class="o">,</span> <span class="mi">5L</span><span class="o">,</span> <span class="mf">0.9</span><span class="o">)));</span>
 
-<span class="n">PowerIterationClustering</span> <span class="n">pic</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">PowerIterationClustering</span><span class="o">()</span>
+<span class="n">PowerIterationClustering</span> <span class="n">pic</span> <span class="o">=</span> <span class="k">new</span> <span class="n">PowerIterationClustering</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setK</span><span class="o">(</span><span class="mi">2</span><span class="o">)</span>
   <span class="o">.</span><span class="na">setMaxIterations</span><span class="o">(</span><span class="mi">10</span><span class="o">);</span>
 <span class="n">PowerIterationClusteringModel</span> <span class="n">model</span> <span class="o">=</span> <span class="n">pic</span><span class="o">.</span><span class="na">run</span><span class="o">(</span><span class="n">similarities</span><span class="o">);</span>
 
-<span class="k">for</span> <span class="o">(</span><span class="n">PowerIterationClustering</span><span class="o">.</span><span class="na">Assignment</span> <span class="nl">a:</span> <span class="n">model</span><span class="o">.</span><span class="na">assignments</span><span class="o">().</span><span class="na">toJavaRDD</span><span class="o">().</span><span class="na">collect</span><span class="o">())</span> <span class="o">{</span>
+<span class="k">for</span> <span class="o">(</span><span class="n">PowerIterationClustering</span><span class="o">.</span><span class="na">Assignment</span> <span class="n">a</span><span class="o">:</span> <span class="n">model</span><span class="o">.</span><span class="na">assignments</span><span class="o">().</span><span class="na">toJavaRDD</span><span class="o">().</span><span class="na">collect</span><span class="o">())</span> <span class="o">{</span>
   <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="n">a</span><span class="o">.</span><span class="na">id</span><span class="o">()</span> <span class="o">+</span> <span class="s">&quot; -&gt; &quot;</span> <span class="o">+</span> <span class="n">a</span><span class="o">.</span><span class="na">cluster</span><span class="o">());</span>
 <span class="o">}</span>
 </pre></div>
@@ -770,21 +770,21 @@ which contains the computed clustering assignments.</p>
 
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.PowerIterationClustering"><code>PowerIterationClustering</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.PowerIterationClusteringModel"><code>PowerIterationClusteringModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">PowerIterationClustering</span><span class="p">,</span> <span class="n">PowerIterationClusteringModel</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">PowerIterationClustering</span><span class="p">,</span> <span class="n">PowerIterationClusteringModel</span>
 
-<span class="c"># Load and parse the data</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/pic_data.txt&quot;</span><span class="p">)</span>
-<span class="n">similarities</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)]))</span>
+<span class="c1"># Load and parse the data</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/pic_data.txt&quot;</span><span class="p">)</span>
+<span class="n">similarities</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)]))</span>
 
-<span class="c"># Cluster the data into two classes using PowerIterationClustering</span>
+<span class="c1"># Cluster the data into two classes using PowerIterationClustering</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">PowerIterationClustering</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">similarities</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
 
-<span class="n">model</span><span class="o">.</span><span class="n">assignments</span><span class="p">()</span><span class="o">.</span><span class="n">foreach</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="k">print</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">id</span><span class="p">)</span> <span class="o">+</span> <span class="s">&quot; -&gt; &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">cluster</span><span class="p">)))</span>
+<span class="n">model</span><span class="o">.</span><span class="n">assignments</span><span class="p">()</span><span class="o">.</span><span class="n">foreach</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="k">print</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">id</span><span class="p">)</span> <span class="o">+</span> <span class="s2">&quot; -&gt; &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">x</span><span class="o">.</span><span class="n">cluster</span><span class="p">)))</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonPowerIterationClusteringExample/PICModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonPowerIterationClusteringExample/PICModel&quot;</span><span class="p">)</span>
 <span class="n">sameModel</span> <span class="o">=</span> <span class="n">PowerIterationClusteringModel</span>\
-    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonPowerIterationClusteringExample/PICModel&quot;</span><span class="p">)</span>
+    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonPowerIterationClusteringExample/PICModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/power_iteration_clustering_example.py" in the Spark repo.</small></div>
   </div>
@@ -947,7 +947,7 @@ to the algorithm. We then output the topics, represented as probability distribu
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.clustering.LDA"><code>LDA</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.clustering.DistributedLDAModel"><code>DistributedLDAModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.</span><span class="o">{</span><span class="nc">DistributedLDAModel</span><span class="o">,</span> <span class="nc">LDA</span><span class="o">}</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.</span><span class="o">{</span><span class="nc">DistributedLDAModel</span><span class="o">,</span> <span class="nc">LDA</span><span class="o">}</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 
 <span class="c1">// Load and parse the data</span>
@@ -979,7 +979,7 @@ to the algorithm. We then output the topics, represented as probability distribu
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/clustering/LDA.html"><code>LDA</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/clustering/DistributedLDAModel.html"><code>DistributedLDAModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">scala.Tuple2</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaPairRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
@@ -1019,7 +1019,7 @@ to the algorithm. We then output the topics, represented as probability distribu
 <span class="n">corpus</span><span class="o">.</span><span class="na">cache</span><span class="o">();</span>
 
 <span class="c1">// Cluster the documents into three topics using LDA</span>
-<span class="n">LDAModel</span> <span class="n">ldaModel</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">LDA</span><span class="o">().</span><span class="na">setK</span><span class="o">(</span><span class="mi">3</span><span class="o">).</span><span class="na">run</span><span class="o">(</span><span class="n">corpus</span><span class="o">);</span>
+<span class="n">LDAModel</span> <span class="n">ldaModel</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LDA</span><span class="o">().</span><span class="na">setK</span><span class="o">(</span><span class="mi">3</span><span class="o">).</span><span class="na">run</span><span class="o">(</span><span class="n">corpus</span><span class="o">);</span>
 
 <span class="c1">// Output topics. Each is a distribution over words (matching word count vectors)</span>
 <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;Learned topics (as distributions over vocab of &quot;</span> <span class="o">+</span> <span class="n">ldaModel</span><span class="o">.</span><span class="na">vocabSize</span><span class="o">()</span>
@@ -1044,31 +1044,31 @@ to the algorithm. We then output the topics, represented as probability distribu
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.LDA"><code>LDA</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.LDAModel"><code>LDAModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">LDA</span><span class="p">,</span> <span class="n">LDAModel</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">LDA</span><span class="p">,</span> <span class="n">LDAModel</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Vectors</span>
 
-<span class="c"># Load and parse the data</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/sample_lda_data.txt&quot;</span><span class="p">)</span>
-<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)]))</span>
-<span class="c"># Index documents with unique IDs</span>
+<span class="c1"># Load and parse the data</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/sample_lda_data.txt&quot;</span><span class="p">)</span>
+<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)]))</span>
+<span class="c1"># Index documents with unique IDs</span>
 <span class="n">corpus</span> <span class="o">=</span> <span class="n">parsedData</span><span class="o">.</span><span class="n">zipWithIndex</span><span class="p">()</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="p">[</span><span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]])</span><span class="o">.</span><span class="n">cache</span><span class="p">()</span>
 
-<span class="c"># Cluster the documents into three topics using LDA</span>
+<span class="c1"># Cluster the documents into three topics using LDA</span>
 <span class="n">ldaModel</span> <span class="o">=</span> <span class="n">LDA</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">corpus</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
 
-<span class="c"># Output topics. Each is a distribution over words (matching word count vectors)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Learned topics (as distributions over vocab of &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">ldaModel</span><span class="o">.</span><span class="n">vocabSize</span><span class="p">())</span>
-      <span class="o">+</span> <span class="s">&quot; words):&quot;</span><span class="p">)</span>
+<span class="c1"># Output topics. Each is a distribution over words (matching word count vectors)</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Learned topics (as distributions over vocab of &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">ldaModel</span><span class="o">.</span><span class="n">vocabSize</span><span class="p">())</span>
+      <span class="o">+</span> <span class="s2">&quot; words):&quot;</span><span class="p">)</span>
 <span class="n">topics</span> <span class="o">=</span> <span class="n">ldaModel</span><span class="o">.</span><span class="n">topicsMatrix</span><span class="p">()</span>
 <span class="k">for</span> <span class="n">topic</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">):</span>
-    <span class="k">print</span><span class="p">(</span><span class="s">&quot;Topic &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">topic</span><span class="p">)</span> <span class="o">+</span> <span class="s">&quot;:&quot;</span><span class="p">)</span>
+    <span class="k">print</span><span class="p">(</span><span class="s2">&quot;Topic &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">topic</span><span class="p">)</span> <span class="o">+</span> <span class="s2">&quot;:&quot;</span><span class="p">)</span>
     <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">ldaModel</span><span class="o">.</span><span class="n">vocabSize</span><span class="p">()):</span>
-        <span class="k">print</span><span class="p">(</span><span class="s">&quot; &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">topics</span><span class="p">[</span><span class="n">word</span><span class="p">][</span><span class="n">topic</span><span class="p">]))</span>
+        <span class="k">print</span><span class="p">(</span><span class="s2">&quot; &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">topics</span><span class="p">[</span><span class="n">word</span><span class="p">][</span><span class="n">topic</span><span class="p">]))</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">ldaModel</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonLatentDirichletAllocationExample/LDAModel&quot;</span><span class="p">)</span>
+<span class="c1"># Save and load model</span>
+<span class="n">ldaModel</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonLatentDirichletAllocationExample/LDAModel&quot;</span><span class="p">)</span>
 <span class="n">sameModel</span> <span class="o">=</span> <span class="n">LDAModel</span>\
-    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s">&quot;target/org/apache/spark/PythonLatentDirichletAllocationExample/LDAModel&quot;</span><span class="p">)</span>
+    <span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="s2">&quot;target/org/apache/spark/PythonLatentDirichletAllocationExample/LDAModel&quot;</span><span class="p">)</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/python/mllib/latent_dirichlet_allocation_example.py" in the Spark repo.</small></div>
   </div>
@@ -1104,7 +1104,7 @@ The implementation in MLlib has the following parameters:</p>
 <div data-lang="scala">
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.clustering.BisectingKMeans"><code>BisectingKMeans</code> Scala docs</a> and <a href="api/scala/index.html#org.apache.spark.mllib.clustering.BisectingKMeansModel"><code>BisectingKMeansModel</code> Scala docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.BisectingKMeans</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.BisectingKMeans</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.</span><span class="o">{</span><span class="nc">Vector</span><span class="o">,</span> <span class="nc">Vectors</span><span class="o">}</span>
 
 <span class="c1">// Loads and parses data</span>
@@ -1116,9 +1116,9 @@ The implementation in MLlib has the following parameters:</p>
 <span class="k">val</span> <span class="n">model</span> <span class="k">=</span> <span class="n">bkm</span><span class="o">.</span><span class="n">run</span><span class="o">(</span><span class="n">data</span><span class="o">)</span>
 
 <span class="c1">// Show the compute cost and the cluster centers</span>
-<span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Compute Cost: ${model.computeCost(data)}&quot;</span><span class="o">)</span>
+<span class="n">println</span><span class="o">(</span><span class="s">s&quot;Compute Cost: </span><span class="si">${</span><span class="n">model</span><span class="o">.</span><span class="n">computeCost</span><span class="o">(</span><span class="n">data</span><span class="o">)</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 <span class="n">model</span><span class="o">.</span><span class="n">clusterCenters</span><span class="o">.</span><span class="n">zipWithIndex</span><span class="o">.</span><span class="n">foreach</span> <span class="o">{</span> <span class="k">case</span> <span class="o">(</span><span class="n">center</span><span class="o">,</span> <span class="n">idx</span><span class="o">)</span> <span class="k">=&gt;</span>
-  <span class="n">println</span><span class="o">(</span><span class="n">s</span><span class="s">&quot;Cluster Center ${idx}: ${center}&quot;</span><span class="o">)</span>
+  <span class="n">println</span><span class="o">(</span><span class="s">s&quot;Cluster Center </span><span class="si">${</span><span class="n">idx</span><span class="si">}</span><span class="s">: </span><span class="si">${</span><span class="n">center</span><span class="si">}</span><span class="s">&quot;</span><span class="o">)</span>
 <span class="o">}</span>
 </pre></div>
     <div><small>Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/BisectingKMeansExample.scala" in the Spark repo.</small></div>
@@ -1127,7 +1127,7 @@ The implementation in MLlib has the following parameters:</p>
 <div data-lang="java">
     <p>Refer to the <a href="api/java/org/apache/spark/mllib/clustering/BisectingKMeans.html"><code>BisectingKMeans</code> Java docs</a> and <a href="api/java/org/apache/spark/mllib/clustering/BisectingKMeansModel.html"><code>BisectingKMeansModel</code> Java docs</a> for details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">import</span> <span class="nn">com.google.common.collect.Lists</span><span class="o">;</span>
+    <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">com.google.common.collect.Lists</span><span class="o">;</span>
 
 <span class="kn">import</span> <span class="nn">org.apache.spark.api.java.JavaRDD</span><span class="o">;</span>
 <span class="kn">import</span> <span class="nn">org.apache.spark.mllib.clustering.BisectingKMeans</span><span class="o">;</span>
@@ -1143,7 +1143,7 @@ The implementation in MLlib has the following parameters:</p>
 <span class="o">);</span>
 <span class="n">JavaRDD</span><span class="o">&lt;</span><span class="n">Vector</span><span class="o">&gt;</span> <span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="na">parallelize</span><span class="o">(</span><span class="n">localData</span><span class="o">,</span> <span class="mi">2</span><span class="o">);</span>
 
-<span class="n">BisectingKMeans</span> <span class="n">bkm</span> <span class="o">=</span> <span class="k">new</span> <span class="nf">BisectingKMeans</span><span class="o">()</span>
+<span class="n">BisectingKMeans</span> <span class="n">bkm</span> <span class="o">=</span> <span class="k">new</span> <span class="n">BisectingKMeans</span><span class="o">()</span>
   <span class="o">.</span><span class="na">setK</span><span class="o">(</span><span class="mi">4</span><span class="o">);</span>
 <span class="n">BisectingKMeansModel</span> <span class="n">model</span> <span class="o">=</span> <span class="n">bkm</span><span class="o">.</span><span class="na">run</span><span class="o">(</span><span class="n">data</span><span class="o">);</span>
 
@@ -1161,23 +1161,23 @@ The implementation in MLlib has the following parameters:</p>
 <div data-lang="python">
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.BisectingKMeans"><code>BisectingKMeans</code> Python docs</a> and <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.BisectingKMeansModel"><code>BisectingKMeansModel</code> Python docs</a> for more details on the API.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">numpy</span> <span class="kn">import</span> <span class="n">array</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">numpy</span> <span class="kn">import</span> <span class="n">array</span>
 
 <span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">BisectingKMeans</span><span class="p">,</span> <span class="n">BisectingKMeansModel</span>
 
-<span class="c"># Load and parse the data</span>
-<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span>
-<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">array</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)]))</span>
+<span class="c1"># Load and parse the data</span>
+<span class="n">data</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span>
+<span class="n">parsedData</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">array</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)]))</span>
 
-<span class="c"># Build the model (cluster the data)</span>
+<span class="c1"># Build the model (cluster the data)</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">BisectingKMeans</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">parsedData</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">maxIterations</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
 
-<span class="c"># Evaluate clustering</span>
+<span class="c1"># Evaluate clustering</span>
 <span class="n">cost</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">computeCost</span><span class="p">(</span><span class="n">parsedData</span><span class="p">)</span>
-<span class="k">print</span><span class="p">(</span><span class="s">&quot;Bisecting K-means Cost = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">cost</span><span class="p">))</span>
+<span class="k">print</span><span class="p">(</span><span class="s2">&quot;Bisecting K-means Cost = &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">cost</span><span class="p">))</span>
 
-<span class="c"># Save and load model</span>
-<span class="n">path</span> <span class="o">=</span> <span class="s">&quot;target/org/apache/spark/PythonBisectingKMeansExample/BisectingKMeansModel&quot;</span>
+<span class="c1"># Save and load model</span>
+<span class="n">path</span> <span class="o">=</span> <span class="s2">&quot;target/org/apache/spark/PythonBisectingKMeansExample/BisectingKMeansModel&quot;</span>
 <span class="n">model</span><span class="o">.</span><span class="n">save</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="n">path</span><span class="p">)</span>
 <span class="n">sameModel</span> <span class="o">=</span> <span class="n">BisectingKMeansModel</span><span class="o">.</span><span class="n">load</span><span class="p">(</span><span class="n">sc</span><span class="p">,</span> <span class="n">path</span><span class="p">)</span>
 </pre></div>
@@ -1223,7 +1223,7 @@ will be adjusted accordingly.</p>
     <p>Refer to the <a href="api/scala/index.html#org.apache.spark.mllib.clustering.StreamingKMeans"><code>StreamingKMeans</code> Scala docs</a> for details on the API.
 And Refer to <a href="streaming-programming-guide.html#initializing">Spark Streaming Programming Guide</a> for details on StreamingContext.</p>
 
-    <div class="highlight"><pre><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.StreamingKMeans</span>
+    <div class="highlight"><pre><span></span><span class="k">import</span> <span class="nn">org.apache.spark.mllib.clustering.StreamingKMeans</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.linalg.Vectors</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.mllib.regression.LabeledPoint</span>
 <span class="k">import</span> <span class="nn">org.apache.spark.streaming.</span><span class="o">{</span><span class="nc">Seconds</span><span class="o">,</span> <span class="nc">StreamingContext</span><span class="o">}</span>
@@ -1252,22 +1252,22 @@ And Refer to <a href="streaming-programming-guide.html#initializing">Spark Strea
     <p>Refer to the <a href="api/python/pyspark.mllib.html#pyspark.mllib.clustering.StreamingKMeans"><code>StreamingKMeans</code> Python docs</a> for more details on the API.
 And Refer to <a href="streaming-programming-guide.html#initializing">Spark Streaming Programming Guide</a> for details on StreamingContext.</p>
 
-    <div class="highlight"><pre><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Vectors</span>
+    <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pyspark.mllib.linalg</span> <span class="kn">import</span> <span class="n">Vectors</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.regression</span> <span class="kn">import</span> <span class="n">LabeledPoint</span>
 <span class="kn">from</span> <span class="nn">pyspark.mllib.clustering</span> <span class="kn">import</span> <span class="n">StreamingKMeans</span>
 
-<span class="c"># we make an input stream of vectors for training,</span>
-<span class="c"># as well as a stream of vectors for testing</span>
+<span class="c1"># we make an input stream of vectors for training,</span>
+<span class="c1"># as well as a stream of vectors for testing</span>
 <span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="n">lp</span><span class="p">):</span>
-    <span class="n">label</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">lp</span><span class="p">[</span><span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">&#39;(&#39;</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">&#39;)&#39;</span><span class="p">)])</span>
-    <span class="n">vec</span> <span class="o">=</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">(</span><span class="n">lp</span><span class="p">[</span><span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">&#39;[&#39;</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s">&#39;]&#39;</span><span class="p">)]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39;,&#39;</span><span class="p">))</span>
+    <span class="n">label</span> <span class="o">=</span> <span class="nb">float</span><span class="p">(</span><span class="n">lp</span><span class="p">[</span><span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">&#39;(&#39;</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">&#39;)&#39;</span><span class="p">)])</span>
+    <span class="n">vec</span> <span class="o">=</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">(</span><span class="n">lp</span><span class="p">[</span><span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">&#39;[&#39;</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">:</span> <span class="n">lp</span><span class="o">.</span><span class="n">find</span><span class="p">(</span><span class="s1">&#39;]&#39;</span><span class="p">)]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;,&#39;</span><span class="p">))</span>
 
     <span class="k">return</span> <span class="n">LabeledPoint</span><span class="p">(</span><span class="n">label</span><span class="p">,</span> <span class="n">vec</span><span class="p">)</span>
 
-<span class="n">trainingData</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span>\
-    <span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">&#39; &#39;</span><span class="p">)]))</span>
+<span class="n">trainingData</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/kmeans_data.txt&quot;</span><span class="p">)</span>\
+    <span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">line</span><span class="p">:</span> <span class="n">Vectors</span><span class="o">.</span><span class="n">dense</span><span class="p">([</span><span class="nb">float</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">)]))</span>
 
-<span class="n">testingData</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s">&quot;data/mllib/streaming_kmeans_data_test.txt&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">parse</span><span class="p">)</span>
+<span class="n">testingData</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">textFile</span><span class="p">(</span><span class="s2">&quot;data/mllib/streaming_kmeans_data_test.txt&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="n">parse</span><span class="p">)</span>
 
 <span class="n">trainingQueue</span> <span class="o">=</span> <span class="p">[</span><span class="n">trainingData</span><span class="p">]</span>
 <span class="n">testingQueue</span> <span class="o">=</span> <span class="p">[</span><span class="n">testingData</span><span class="p">]</span>
@@ -1275,11 +1275,11 @@ And Refer to <a href="streaming-programming-guide.html#initializing">Spark Strea
 <span class="n">trainingStream</span> <span class="o">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">queueStream</span><span class="p">(</span><span class="n">trainingQueue</span><span class="p">)</span>
 <span class="n">testingStream</span> <span class="o">=</span> <span class="n">ssc</span><span class="o">.</span><span class="n">queueStream</span><span class="p">(</span><span class="n">testingQueue</span><span class="p">)</span>
 
-<span class="c"># We create a model with random clusters and specify the number of clusters to find</span>
+<span class="c1"># We create a model with random clusters and specify the number of clusters to find</span>
 <span class="n">model</span> <span class="o">=</span> <span class="n">StreamingKMeans</span><span class="p">(</span><span class="n">k</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">decayFactor</span><span class="o">=</span><span class="mf">1.0</span><span class="p">)</span><span class="o">.</span><span class="n">setRandomCenters</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
 
-<span class="c"># Now register the streams for training and testing and start the job,</span>
-<span class="c"># printing the predicted cluster assignments on new data points as they arrive.</span>
+<span class="c1"># Now register the streams for training and testing and start the job,</span>
+<span class="c1"># printing the predicted cluster assignments on new data points as they arrive.</span>
 <span class="n">model</span><span class="o">.</span><span class="n">trainOn</span><span class="p">(</span><span class="n">trainingStream</span><span class="p">)</span>
 
 <span class="n">result</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">predictOnValues</span><span class="p">(</span><span class="n">testingStream</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">lp</span><span class="p">:</span> <span class="p">(</span><span class="n">lp</span><span class="o">.</span><span class="n">label</span><span class="p">,</span> <span class="n">lp</span><span class="o">.</span><span class="n">features</span><span class="p">)))</span>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org