You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2018/07/04 04:41:52 UTC
[5/7] spark-website git commit: Fix signature description broken in PySpark API documentation in 2.3.1

http://git-wip-us.apache.org/repos/asf/spark-website/blob/5660fb9a/site/docs/2.3.1/api/python/pyspark.ml.html
----------------------------------------------------------------------
diff --git a/site/docs/2.3.1/api/python/pyspark.ml.html b/site/docs/2.3.1/api/python/pyspark.ml.html
index 4ada723..986c949 100644
--- a/site/docs/2.3.1/api/python/pyspark.ml.html
+++ b/site/docs/2.3.1/api/python/pyspark.ml.html
@@ -5,14 +5,14 @@
 <html xmlns="http://www.w3.org/1999/xhtml">
   <head>
     <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
-    <title>pyspark.ml package &#8212; PySpark master documentation</title>
+    <title>pyspark.ml package &#8212; PySpark 2.3.1 documentation</title>
     <link rel="stylesheet" href="_static/nature.css" type="text/css" />
     <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
     <link rel="stylesheet" href="_static/pyspark.css" type="text/css" />
     <script type="text/javascript">
       var DOCUMENTATION_OPTIONS = {
         URL_ROOT:    './',
-        VERSION:     'master',
+        VERSION:     '2.3.1',
         COLLAPSE_INDEX: false,
         FILE_SUFFIX: '.html',
         HAS_SOURCE:  true,
@@ -39,7 +39,7 @@
           <a href="pyspark.streaming.html" title="pyspark.streaming module"
              accesskey="P">previous</a> |</li>
     
-        <li class="nav-item nav-item-0"><a href="index.html">PySpark master documentation</a> &#187;</li>
+        <li class="nav-item nav-item-0"><a href="index.html">PySpark 2.3.1 documentation</a> &#187;</li>
 
           <li class="nav-item nav-item-1"><a href="pyspark.html" accesskey="U">pyspark package</a> &#187;</li> 
       </ul>
@@ -718,7 +718,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.Pipeline">
-<em class="property">class </em><code class="descclassname">pyspark.ml.</code><code class="descname">Pipeline</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/pipeline.html#Pipeline"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.Pipeline" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.</code><code class="descname">Pipeline</code><span class="sig-paren">(</span><em>stages=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/pipeline.html#Pipeline"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.Pipeline" title="Permalink to this definition">¶</a></dt>
 <dd><p>A simple pipeline, which acts as an estimator. A Pipeline consists
 of a sequence of stages, each of which is either an
 <a class="reference internal" href="#pyspark.ml.Estimator" title="pyspark.ml.Estimator"><code class="xref py py-class docutils literal"><span class="pre">Estimator</span></code></a> or a <a class="reference internal" href="#pyspark.ml.Transformer" title="pyspark.ml.Transformer"><code class="xref py py-class docutils literal"><span class="pre">Transformer</span></code></a>. When
@@ -1360,7 +1360,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 <span id="pyspark-ml-feature-module"></span><h2>pyspark.ml.feature module<a class="headerlink" href="#module-pyspark.ml.feature" title="Permalink to this headline">¶</a></h2>
 <dl class="class">
 <dt id="pyspark.ml.feature.Binarizer">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">Binarizer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#Binarizer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.Binarizer" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">Binarizer</code><span class="sig-paren">(</span><em>threshold=0.0</em>, <em>inputCol=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#Binarizer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.Binarizer" title="Permalink to this definition">¶</a></dt>
 <dd><p>Binarize a column of continuous features given a threshold.</p>
 <div class="highlight-default"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([(</span><span class="mf">0.5</span><span class="p">,)],</span> <span class="p">[</span><span class="s2">&quot;values&quot;</span><span class="p">])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">binarizer</span> <span class="o">=</span> <span class="n">Binarizer</span><span class="p">(</span><span class="n">threshold</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;values&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;features&quot;</span><span class="p">)</span>
@@ -1606,7 +1606,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.BucketedRandomProjectionLSH">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">BucketedRandomProjectionLSH</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#BucketedRandomProjectionLSH"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.BucketedRandomProjectionLSH" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">BucketedRandomProjectionLSH</code><span class="sig-paren">(</span><em>inputCol=None</em>, <em>outputCol=None</em>, <em>seed=None</em>, <em>numHashTables=1</em>, <em>bucketLength=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#BucketedRandomProjectionLSH"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.BucketedRandomProjectionLSH" title="Permalink to this definition">¶</a></dt>
 <dd><div class="admonition note">
 <p class="first admonition-title">Note</p>
 <p class="last">Experimental</p>
@@ -2195,7 +2195,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.Bucketizer">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">Bucketizer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#Bucketizer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.Bucketizer" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">Bucketizer</code><span class="sig-paren">(</span><em>splits=None</em>, <em>inputCol=None</em>, <em>outputCol=None</em>, <em>handleInvalid='error'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#Bucketizer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.Bucketizer" title="Permalink to this definition">¶</a></dt>
 <dd><p>Maps a column of continuous features to a column of feature buckets.</p>
 <div class="highlight-default"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">values</span> <span class="o">=</span> <span class="p">[(</span><span class="mf">0.1</span><span class="p">,),</span> <span class="p">(</span><span class="mf">0.4</span><span class="p">,),</span> <span class="p">(</span><span class="mf">1.2</span><span class="p">,),</span> <span class="p">(</span><span class="mf">1.5</span><span class="p">,),</span> <span class="p">(</span><span class="nb">float</span><span class="p">(</span><span class="s2">&quot;nan&quot;</span><span class="p">),),</span> <span class="p">(</span><span class="nb">float</span><span class="p">(</span><span class="s2">&quot;nan&quot;</span><span class="p">),)]</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">(</span><span class="n">values</span><span class="p">,</span> <span class="p">[</span><span class="s2">&quot;values&quot;</span><span class="p">])</span>
@@ -2469,7 +2469,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.ChiSqSelector">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">ChiSqSelector</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#ChiSqSelector"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.ChiSqSelector" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">ChiSqSelector</code><span class="sig-paren">(</span><em>numTopFeatures=50</em>, <em>featuresCol='features'</em>, <em>outputCol=None</em>, <em>labelCol='label'</em>, <em>selectorType='numTopFeatures'</em>, <em>percentile=0.1</em>, <em>fpr=0.05</em>, <em>fdr=0.05</em>, <em>fwe=0.05</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#ChiSqSelector"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.ChiSqSelector" title="Permalink to this definition">¶</a></dt>
 <dd><div class="admonition note">
 <p class="first admonition-title">Note</p>
 <p class="last">Experimental</p>
@@ -3095,7 +3095,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.CountVectorizer">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">CountVectorizer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#CountVectorizer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.CountVectorizer" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">CountVectorizer</code><span class="sig-paren">(</span><em>minTF=1.0</em>, <em>minDF=1.0</em>, <em>vocabSize=262144</em>, <em>binary=False</em>, <em>inputCol=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#CountVectorizer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.CountVectorizer" title="Permalink to this definition">¶</a></dt>
 <dd><p>Extracts a vocabulary from document collections and generates a <a class="reference internal" href="#pyspark.ml.feature.CountVectorizerModel" title="pyspark.ml.feature.CountVectorizerModel"><code class="xref py py-attr docutils literal"><span class="pre">CountVectorizerModel</span></code></a>.</p>
 <div class="highlight-default"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">(</span>
 <span class="gp">... </span>   <span class="p">[(</span><span class="mi">0</span><span class="p">,</span> <span class="p">[</span><span class="s2">&quot;a&quot;</span><span class="p">,</span> <span class="s2">&quot;b&quot;</span><span class="p">,</span> <span class="s2">&quot;c&quot;</span><span class="p">]),</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="p">[</span><span class="s2">&quot;a&quot;</span><span class="p">,</span> <span class="s2">&quot;b&quot;</span><span class="p">,</span> <span class="s2">&quot;b&quot;</span><span class="p">,</span> <span class="s2">&quot;c&quot;</span><span class="p">,</span> <span class="s2">&quot;a&quot;</span><span class="p">])],</span>
@@ -3634,7 +3634,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.DCT">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">DCT</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#DCT"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.DCT" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">DCT</code><span class="sig-paren">(</span><em>inverse=False</em>, <em>inputCol=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#DCT"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.DCT" title="Permalink to this definition">¶</a></dt>
 <dd><p>A feature transformer that takes the 1D discrete cosine transform
 of a real vector. No zero padding is performed on the input vector.
 It returns a real vector of the same length representing the DCT.
@@ -3888,7 +3888,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.ElementwiseProduct">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">ElementwiseProduct</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#ElementwiseProduct"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.ElementwiseProduct" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">ElementwiseProduct</code><span class="sig-paren">(</span><em>scalingVec=None</em>, <em>inputCol=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#ElementwiseProduct"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.ElementwiseProduct" title="Permalink to this definition">¶</a></dt>
 <dd><p>Outputs the Hadamard product (i.e., the element-wise product) of each input vector
 with a provided “weight” vector. In other words, it scales each column of the dataset
 by a scalar multiplier.</p>
@@ -4135,7 +4135,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.FeatureHasher">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">FeatureHasher</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#FeatureHasher"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.FeatureHasher" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">FeatureHasher</code><span class="sig-paren">(</span><em>numFeatures=262144</em>, <em>inputCols=None</em>, <em>outputCol=None</em>, <em>categoricalCols=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#FeatureHasher"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.FeatureHasher" title="Permalink to this definition">¶</a></dt>
 <dd><div class="admonition note">
 <p class="first admonition-title">Note</p>
 <p class="last">Experimental</p>
@@ -4437,7 +4437,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.HashingTF">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">HashingTF</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#HashingTF"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.HashingTF" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">HashingTF</code><span class="sig-paren">(</span><em>numFeatures=262144</em>, <em>binary=False</em>, <em>inputCol=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#HashingTF"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.HashingTF" title="Permalink to this definition">¶</a></dt>
 <dd><p>Maps a sequence of terms to their term frequencies using the hashing trick.
 Currently we use Austin Appleby’s MurmurHash 3 algorithm (MurmurHash3_x86_32)
 to calculate the hash code value for the term object.
@@ -4705,7 +4705,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.IDF">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">IDF</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#IDF"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.IDF" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">IDF</code><span class="sig-paren">(</span><em>minDocFreq=0</em>, <em>inputCol=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#IDF"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.IDF" title="Permalink to this definition">¶</a></dt>
 <dd><p>Compute the Inverse Document Frequency (IDF) given a collection of documents.</p>
 <div class="highlight-default"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pyspark.ml.linalg</span> <span class="k">import</span> <span class="n">DenseVector</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([(</span><span class="n">DenseVector</span><span class="p">([</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">]),),</span>
@@ -5170,7 +5170,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.Imputer">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">Imputer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#Imputer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.Imputer" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">Imputer</code><span class="sig-paren">(</span><em>strategy='mean'</em>, <em>missingValue=nan</em>, <em>inputCols=None</em>, <em>outputCols=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#Imputer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.Imputer" title="Permalink to this definition">¶</a></dt>
 <dd><div class="admonition note">
 <p class="first admonition-title">Note</p>
 <p class="last">Experimental</p>
@@ -5693,7 +5693,7 @@ which are used to replace the missing values in the input DataFrame.</p>
 
 <dl class="class">
 <dt id="pyspark.ml.feature.IndexToString">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">IndexToString</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#IndexToString"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.IndexToString" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">IndexToString</code><span class="sig-paren">(</span><em>inputCol=None</em>, <em>outputCol=None</em>, <em>labels=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#IndexToString"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.IndexToString" title="Permalink to this definition">¶</a></dt>
 <dd><p>A <code class="xref py py-class docutils literal"><span class="pre">Transformer</span></code> that maps a column of indices back to a new column of
 corresponding string values.
 The index-string mapping is either from the ML attributes of the input column,
@@ -5927,7 +5927,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.MaxAbsScaler">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">MaxAbsScaler</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#MaxAbsScaler"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.MaxAbsScaler" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">MaxAbsScaler</code><span class="sig-paren">(</span><em>inputCol=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#MaxAbsScaler"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.MaxAbsScaler" title="Permalink to this definition">¶</a></dt>
 <dd><p>Rescale each feature individually to range [-1, 1] by dividing through the largest maximum
 absolute value in each feature. It does not shift/center the data, and thus does not destroy
 any sparsity.</p>
@@ -6371,7 +6371,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.MinHashLSH">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">MinHashLSH</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#MinHashLSH"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.MinHashLSH" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">MinHashLSH</code><span class="sig-paren">(</span><em>inputCol=None</em>, <em>outputCol=None</em>, <em>seed=None</em>, <em>numHashTables=1</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#MinHashLSH"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.MinHashLSH" title="Permalink to this definition">¶</a></dt>
 <dd><div class="admonition note">
 <p class="first admonition-title">Note</p>
 <p class="last">Experimental</p>
@@ -6937,7 +6937,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.MinMaxScaler">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">MinMaxScaler</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#MinMaxScaler"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.MinMaxScaler" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">MinMaxScaler</code><span class="sig-paren">(</span><em>min=0.0</em>, <em>max=1.0</em>, <em>inputCol=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#MinMaxScaler"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.MinMaxScaler" title="Permalink to this definition">¶</a></dt>
 <dd><p>Rescale each feature individually to a common range [min, max] linearly using column summary
 statistics, which is also known as min-max normalization or Rescaling. The rescaled value for
 feature E is calculated as,</p>
@@ -7449,7 +7449,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.NGram">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">NGram</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#NGram"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.NGram" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">NGram</code><span class="sig-paren">(</span><em>n=2</em>, <em>inputCol=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#NGram"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.NGram" title="Permalink to this definition">¶</a></dt>
 <dd><p>A feature transformer that converts the input array of strings into an array of n-grams. Null
 values in the input array are ignored.
 It returns an array of n-grams where each n-gram is represented by a space-separated string of
@@ -7460,15 +7460,15 @@ returned.</p>
 <div class="highlight-default"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([</span><span class="n">Row</span><span class="p">(</span><span class="n">inputTokens</span><span class="o">=</span><span class="p">[</span><span class="s2">&quot;a&quot;</span><span class="p">,</span> <span class="s2">&quot;b&quot;</span><span class="p">,</span> <span class="s2">&quot;c&quot;</span><span class="p">,</span> <span class="s2">&quot;d&quot;</span><span class="p">,</span> <span class="s2">&quot;e&quot;</span><span class="p">])])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">ngram</span> <span class="o">=</span> <span class="n">NGram</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;inputTokens&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;nGrams&quot;</span><span class="p">)</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">ngram</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
-<span class="go">Row(inputTokens=[u&#39;a&#39;, u&#39;b&#39;, u&#39;c&#39;, u&#39;d&#39;, u&#39;e&#39;], nGrams=[u&#39;a b&#39;, u&#39;b c&#39;, u&#39;c d&#39;, u&#39;d e&#39;])</span>
+<span class="go">Row(inputTokens=[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;e&#39;], nGrams=[&#39;a b&#39;, &#39;b c&#39;, &#39;c d&#39;, &#39;d e&#39;])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Change n-gram length</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">ngram</span><span class="o">.</span><span class="n">setParams</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
-<span class="go">Row(inputTokens=[u&#39;a&#39;, u&#39;b&#39;, u&#39;c&#39;, u&#39;d&#39;, u&#39;e&#39;], nGrams=[u&#39;a b c d&#39;, u&#39;b c d e&#39;])</span>
+<span class="go">Row(inputTokens=[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;e&#39;], nGrams=[&#39;a b c d&#39;, &#39;b c d e&#39;])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Temporarily modify output column.</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">ngram</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="p">{</span><span class="n">ngram</span><span class="o">.</span><span class="n">outputCol</span><span class="p">:</span> <span class="s2">&quot;output&quot;</span><span class="p">})</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
-<span class="go">Row(inputTokens=[u&#39;a&#39;, u&#39;b&#39;, u&#39;c&#39;, u&#39;d&#39;, u&#39;e&#39;], output=[u&#39;a b c d&#39;, u&#39;b c d e&#39;])</span>
+<span class="go">Row(inputTokens=[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;e&#39;], output=[&#39;a b c d&#39;, &#39;b c d e&#39;])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">ngram</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
-<span class="go">Row(inputTokens=[u&#39;a&#39;, u&#39;b&#39;, u&#39;c&#39;, u&#39;d&#39;, u&#39;e&#39;], nGrams=[u&#39;a b c d&#39;, u&#39;b c d e&#39;])</span>
+<span class="go">Row(inputTokens=[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;d&#39;, &#39;e&#39;], nGrams=[&#39;a b c d&#39;, &#39;b c d e&#39;])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Must use keyword arguments to specify params.</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">ngram</span><span class="o">.</span><span class="n">setParams</span><span class="p">(</span><span class="s2">&quot;text&quot;</span><span class="p">)</span>
 <span class="gt">Traceback (most recent call last):</span>
@@ -7709,7 +7709,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.Normalizer">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">Normalizer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#Normalizer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.Normalizer" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">Normalizer</code><span class="sig-paren">(</span><em>p=2.0</em>, <em>inputCol=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#Normalizer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.Normalizer" title="Permalink to this definition">¶</a></dt>
 <dd><blockquote>
 <div>Normalize a vector to have unit norm using the given p-norm.</div></blockquote>
 <div class="highlight-default"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pyspark.ml.linalg</span> <span class="k">import</span> <span class="n">Vectors</span>
@@ -7958,7 +7958,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.OneHotEncoder">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">OneHotEncoder</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#OneHotEncoder"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.OneHotEncoder" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">OneHotEncoder</code><span class="sig-paren">(</span><em>dropLast=True</em>, <em>inputCol=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#OneHotEncoder"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.OneHotEncoder" title="Permalink to this definition">¶</a></dt>
 <dd><p>A one-hot encoder that maps a column of category indices to a
 column of binary vectors, with at most a single one-value per row
 that indicates the input category index.
@@ -8229,7 +8229,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.OneHotEncoderEstimator">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">OneHotEncoderEstimator</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#OneHotEncoderEstimator"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.OneHotEncoderEstimator" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">OneHotEncoderEstimator</code><span class="sig-paren">(</span><em>inputCols=None</em>, <em>outputCols=None</em>, <em>handleInvalid='error'</em>, <em>dropLast=True</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#OneHotEncoderEstimator"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.OneHotEncoderEstimator" title="Permalink to this definition">¶</a></dt>
 <dd><p>A one-hot encoder that maps a column of category indices to a column of binary vectors, with
 at most a single one-value per row that indicates the input category index.
 For example with 5 categories, an input value of 2.0 would map to an output vector of
@@ -8719,7 +8719,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.PCA">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">PCA</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#PCA"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.PCA" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">PCA</code><span class="sig-paren">(</span><em>k=None</em>, <em>inputCol=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#PCA"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.PCA" title="Permalink to this definition">¶</a></dt>
 <dd><p>PCA trains a model to project vectors to a lower dimensional space of the
 top <a class="reference internal" href="#pyspark.ml.feature.PCA.k" title="pyspark.ml.feature.PCA.k"><code class="xref py py-attr docutils literal"><span class="pre">k</span></code></a> principal components.</p>
 <div class="highlight-default"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pyspark.ml.linalg</span> <span class="k">import</span> <span class="n">Vectors</span>
@@ -9195,7 +9195,7 @@ Each column is one principal component.</p>
 
 <dl class="class">
 <dt id="pyspark.ml.feature.PolynomialExpansion">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">PolynomialExpansion</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#PolynomialExpansion"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.PolynomialExpansion" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">PolynomialExpansion</code><span class="sig-paren">(</span><em>degree=2</em>, <em>inputCol=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#PolynomialExpansion"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.PolynomialExpansion" title="Permalink to this definition">¶</a></dt>
 <dd><p>Perform feature expansion in a polynomial space. As said in <a class="reference external" href="http://en.wikipedia.org/wiki/Polynomial_expansion">wikipedia of Polynomial Expansion</a>, “In mathematics, an
 expansion of a product of sums expresses it as a sum of products by using the fact that
 multiplication distributes over addition”. Take a 2-variable feature vector as an example:
@@ -9442,7 +9442,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.QuantileDiscretizer">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">QuantileDiscretizer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#QuantileDiscretizer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.QuantileDiscretizer" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">QuantileDiscretizer</code><span class="sig-paren">(</span><em>numBuckets=2</em>, <em>inputCol=None</em>, <em>outputCol=None</em>, <em>relativeError=0.001</em>, <em>handleInvalid='error'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#QuantileDiscretizer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.QuantileDiscretizer" title="Permalink to this definition">¶</a></dt>
 <dd><div class="admonition note">
 <p class="first admonition-title">Note</p>
 <p class="last">Experimental</p>
@@ -9792,7 +9792,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.RegexTokenizer">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">RegexTokenizer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#RegexTokenizer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.RegexTokenizer" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">RegexTokenizer</code><span class="sig-paren">(</span><em>minTokenLength=1</em>, <em>gaps=True</em>, <em>pattern='\s+'</em>, <em>inputCol=None</em>, <em>outputCol=None</em>, <em>toLowercase=True</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#RegexTokenizer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.RegexTokenizer" title="Permalink to this definition">¶</a></dt>
 <dd><p>A regex based tokenizer that extracts tokens either by using the
 provided regex pattern (in Java dialect) to split the text
 (default) or repeatedly matching the regex (if gaps is false).
@@ -9802,15 +9802,15 @@ It returns an array of strings that can be empty.</p>
 <div class="highlight-default"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([(</span><span class="s2">&quot;A B  c&quot;</span><span class="p">,)],</span> <span class="p">[</span><span class="s2">&quot;text&quot;</span><span class="p">])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">reTokenizer</span> <span class="o">=</span> <span class="n">RegexTokenizer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;text&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;words&quot;</span><span class="p">)</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">reTokenizer</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
-<span class="go">Row(text=u&#39;A B  c&#39;, words=[u&#39;a&#39;, u&#39;b&#39;, u&#39;c&#39;])</span>
+<span class="go">Row(text=&#39;A B  c&#39;, words=[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Change a parameter.</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">reTokenizer</span><span class="o">.</span><span class="n">setParams</span><span class="p">(</span><span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;tokens&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
-<span class="go">Row(text=u&#39;A B  c&#39;, tokens=[u&#39;a&#39;, u&#39;b&#39;, u&#39;c&#39;])</span>
+<span class="go">Row(text=&#39;A B  c&#39;, tokens=[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Temporarily modify a parameter.</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">reTokenizer</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="p">{</span><span class="n">reTokenizer</span><span class="o">.</span><span class="n">outputCol</span><span class="p">:</span> <span class="s2">&quot;words&quot;</span><span class="p">})</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
-<span class="go">Row(text=u&#39;A B  c&#39;, words=[u&#39;a&#39;, u&#39;b&#39;, u&#39;c&#39;])</span>
+<span class="go">Row(text=&#39;A B  c&#39;, words=[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">reTokenizer</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
-<span class="go">Row(text=u&#39;A B  c&#39;, tokens=[u&#39;a&#39;, u&#39;b&#39;, u&#39;c&#39;])</span>
+<span class="go">Row(text=&#39;A B  c&#39;, tokens=[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Must use keyword arguments to specify params.</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">reTokenizer</span><span class="o">.</span><span class="n">setParams</span><span class="p">(</span><span class="s2">&quot;text&quot;</span><span class="p">)</span>
 <span class="gt">Traceback (most recent call last):</span>
@@ -10122,7 +10122,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.RFormula">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">RFormula</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#RFormula"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.RFormula" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">RFormula</code><span class="sig-paren">(</span><em>formula=None</em>, <em>featuresCol='features'</em>, <em>labelCol='label'</em>, <em>forceIndexLabel=False</em>, <em>stringIndexerOrderType='frequencyDesc'</em>, <em>handleInvalid='error'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#RFormula"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.RFormula" title="Permalink to this definition">¶</a></dt>
 <dd><div class="admonition note">
 <p class="first admonition-title">Note</p>
 <p class="last">Experimental</p>
@@ -10682,7 +10682,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.SQLTransformer">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">SQLTransformer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#SQLTransformer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.SQLTransformer" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">SQLTransformer</code><span class="sig-paren">(</span><em>statement=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#SQLTransformer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.SQLTransformer" title="Permalink to this definition">¶</a></dt>
 <dd><p>Implements the transforms which are defined by SQL statement.
 Currently we only support SQL syntax like ‘SELECT … FROM __THIS__’
 where ‘__THIS__’ represents the underlying table of the input dataset.</p>
@@ -10892,7 +10892,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.StandardScaler">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">StandardScaler</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#StandardScaler"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.StandardScaler" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">StandardScaler</code><span class="sig-paren">(</span><em>withMean=False</em>, <em>withStd=True</em>, <em>inputCol=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#StandardScaler"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.StandardScaler" title="Permalink to this definition">¶</a></dt>
 <dd><p>Standardizes features by removing the mean and scaling to unit variance using column summary
 statistics on the samples in the training set.</p>
 <p>The “unit std” is computed using the <a class="reference external" href="https://en.wikipedia.org/wiki/Standard_deviation#Corrected_sample_standard_deviation">corrected sample standard deviation</a>,
@@ -11392,7 +11392,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.StopWordsRemover">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">StopWordsRemover</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#StopWordsRemover"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.StopWordsRemover" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">StopWordsRemover</code><span class="sig-paren">(</span><em>inputCol=None</em>, <em>outputCol=None</em>, <em>stopWords=None</em>, <em>caseSensitive=False</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#StopWordsRemover"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.StopWordsRemover" title="Permalink to this definition">¶</a></dt>
 <dd><p>A feature transformer that filters out stop words from input.</p>
 <div class="admonition note">
 <p class="first admonition-title">Note</p>
@@ -11673,7 +11673,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.StringIndexer">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">StringIndexer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#StringIndexer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.StringIndexer" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">StringIndexer</code><span class="sig-paren">(</span><em>inputCol=None</em>, <em>outputCol=None</em>, <em>handleInvalid='error'</em>, <em>stringOrderType='frequencyDesc'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#StringIndexer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.StringIndexer" title="Permalink to this definition">¶</a></dt>
 <dd><p>A label indexer that maps a string column of labels to an ML column of label indices.
 If the input column is numeric, we cast it to string and index the string values.
 The indices are in [0, numLabels). By default, this is ordered by label frequencies
@@ -12171,21 +12171,21 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.Tokenizer">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">Tokenizer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#Tokenizer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.Tokenizer" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">Tokenizer</code><span class="sig-paren">(</span><em>inputCol=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#Tokenizer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.Tokenizer" title="Permalink to this definition">¶</a></dt>
 <dd><p>A tokenizer that converts the input string to lowercase and then
 splits it by white spaces.</p>
 <div class="highlight-default"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([(</span><span class="s2">&quot;a b c&quot;</span><span class="p">,)],</span> <span class="p">[</span><span class="s2">&quot;text&quot;</span><span class="p">])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">tokenizer</span> <span class="o">=</span> <span class="n">Tokenizer</span><span class="p">(</span><span class="n">inputCol</span><span class="o">=</span><span class="s2">&quot;text&quot;</span><span class="p">,</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;words&quot;</span><span class="p">)</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">tokenizer</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
-<span class="go">Row(text=u&#39;a b c&#39;, words=[u&#39;a&#39;, u&#39;b&#39;, u&#39;c&#39;])</span>
+<span class="go">Row(text=&#39;a b c&#39;, words=[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Change a parameter.</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">tokenizer</span><span class="o">.</span><span class="n">setParams</span><span class="p">(</span><span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;tokens&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
-<span class="go">Row(text=u&#39;a b c&#39;, tokens=[u&#39;a&#39;, u&#39;b&#39;, u&#39;c&#39;])</span>
+<span class="go">Row(text=&#39;a b c&#39;, tokens=[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Temporarily modify a parameter.</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">tokenizer</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">df</span><span class="p">,</span> <span class="p">{</span><span class="n">tokenizer</span><span class="o">.</span><span class="n">outputCol</span><span class="p">:</span> <span class="s2">&quot;words&quot;</span><span class="p">})</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
-<span class="go">Row(text=u&#39;a b c&#39;, words=[u&#39;a&#39;, u&#39;b&#39;, u&#39;c&#39;])</span>
+<span class="go">Row(text=&#39;a b c&#39;, words=[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">tokenizer</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
-<span class="go">Row(text=u&#39;a b c&#39;, tokens=[u&#39;a&#39;, u&#39;b&#39;, u&#39;c&#39;])</span>
+<span class="go">Row(text=&#39;a b c&#39;, tokens=[&#39;a&#39;, &#39;b&#39;, &#39;c&#39;])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="c1"># Must use keyword arguments to specify params.</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">tokenizer</span><span class="o">.</span><span class="n">setParams</span><span class="p">(</span><span class="s2">&quot;text&quot;</span><span class="p">)</span>
 <span class="gt">Traceback (most recent call last):</span>
@@ -12403,7 +12403,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.VectorAssembler">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">VectorAssembler</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#VectorAssembler"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.VectorAssembler" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">VectorAssembler</code><span class="sig-paren">(</span><em>inputCols=None</em>, <em>outputCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#VectorAssembler"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.VectorAssembler" title="Permalink to this definition">¶</a></dt>
 <dd><p>A feature transformer that merges multiple columns into a vector column.</p>
 <div class="highlight-default"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">df</span> <span class="o">=</span> <span class="n">spark</span><span class="o">.</span><span class="n">createDataFrame</span><span class="p">([(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)],</span> <span class="p">[</span><span class="s2">&quot;a&quot;</span><span class="p">,</span> <span class="s2">&quot;b&quot;</span><span class="p">,</span> <span class="s2">&quot;c&quot;</span><span class="p">])</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">vecAssembler</span> <span class="o">=</span> <span class="n">VectorAssembler</span><span class="p">(</span><span class="n">inputCols</span><span class="o">=</span><span class="p">[</span><span class="s2">&quot;a&quot;</span><span class="p">,</span> <span class="s2">&quot;b&quot;</span><span class="p">,</span> <span class="s2">&quot;c&quot;</span><span class="p">],</span> <span class="n">outputCol</span><span class="o">=</span><span class="s2">&quot;features&quot;</span><span class="p">)</span>
@@ -12626,7 +12626,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.VectorIndexer">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">VectorIndexer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#VectorIndexer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.VectorIndexer" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">VectorIndexer</code><span class="sig-paren">(</span><em>maxCategories=20</em>, <em>inputCol=None</em>, <em>outputCol=None</em>, <em>handleInvalid='error'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#VectorIndexer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.VectorIndexer" title="Permalink to this definition">¶</a></dt>
 <dd><p>Class for indexing categorical feature columns in a dataset of <cite>Vector</cite>.</p>
 <dl class="docutils">
 <dt>This has 2 usage modes:</dt>
@@ -13202,7 +13202,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.VectorSizeHint">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">VectorSizeHint</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#VectorSizeHint"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.VectorSizeHint" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">VectorSizeHint</code><span class="sig-paren">(</span><em>inputCol=None</em>, <em>size=None</em>, <em>handleInvalid='error'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#VectorSizeHint"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.VectorSizeHint" title="Permalink to this definition">¶</a></dt>
 <dd><div class="admonition note">
 <p class="first admonition-title">Note</p>
 <p class="last">Experimental</p>
@@ -13464,7 +13464,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.feature.VectorSlicer">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">VectorSlicer</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#VectorSlicer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.VectorSlicer" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">VectorSlicer</code><span class="sig-paren">(</span><em>inputCol=None</em>, <em>outputCol=None</em>, <em>indices=None</em>, <em>names=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#VectorSlicer"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.VectorSlicer" title="Permalink to this definition">¶</a></dt>
 <dd><p>This class takes a feature vector and outputs a new feature vector with a subarray
 of the original features.</p>
 <p>The subset of features can be specified with either indices (<cite>setIndices()</cite>)
@@ -13699,7 +13699,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="method">
 <dt id="pyspark.ml.feature.VectorSlicer.setParams">
-<code class="descname">setParams</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#VectorSlicer.setParams"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.VectorSlicer.setParams" title="Permalink to this definition">¶</a></dt>
+<code class="descname">setParams</code><span class="sig-paren">(</span><em>inputCol=None</em>, <em>outputCol=None</em>, <em>indices=None</em>, <em>names=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#VectorSlicer.setParams"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.VectorSlicer.setParams" title="Permalink to this definition">¶</a></dt>
 <dd><p>setParams(self, inputCol=None, outputCol=None, indices=None, names=None):
 Sets params for this VectorSlicer.</p>
 <div class="versionadded">
@@ -13741,7 +13741,7 @@ Sets params for this VectorSlicer.</p>
 
 <dl class="class">
 <dt id="pyspark.ml.feature.Word2Vec">
-<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">Word2Vec</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#Word2Vec"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.Word2Vec" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.feature.</code><code class="descname">Word2Vec</code><span class="sig-paren">(</span><em>vectorSize=100</em>, <em>minCount=5</em>, <em>numPartitions=1</em>, <em>stepSize=0.025</em>, <em>maxIter=1</em>, <em>seed=None</em>, <em>inputCol=None</em>, <em>outputCol=None</em>, <em>windowSize=5</em>, <em>maxSentenceLength=1000</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/feature.html#Word2Vec"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.feature.Word2Vec" title="Permalink to this definition">¶</a></dt>
 <dd><p>Word2Vec trains a model of <cite>Map(String, Vector)</cite>, i.e. transforms a word into a code for further
 natural language processing or machine learning process.</p>
 <div class="highlight-default"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="n">sent</span> <span class="o">=</span> <span class="p">(</span><span class="s2">&quot;a b &quot;</span> <span class="o">*</span> <span class="mi">100</span> <span class="o">+</span> <span class="s2">&quot;a c &quot;</span> <span class="o">*</span> <span class="mi">10</span><span class="p">)</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">)</span>
@@ -13758,7 +13758,7 @@ natural language processing or machine learning process.</p>
 <span class="go">+----+--------------------+</span>
 <span class="gp">...</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">model</span><span class="o">.</span><span class="n">findSynonymsArray</span><span class="p">(</span><span class="s2">&quot;a&quot;</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
-<span class="go">[(u&#39;b&#39;, 0.25053444504737854), (u&#39;c&#39;, -0.6980510950088501)]</span>
+<span class="go">[(&#39;b&#39;, 0.25053444504737854), (&#39;c&#39;, -0.6980510950088501)]</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pyspark.sql.functions</span> <span class="k">import</span> <span class="n">format_number</span> <span class="k">as</span> <span class="n">fmt</span>
 <span class="gp">&gt;&gt;&gt; </span><span class="n">model</span><span class="o">.</span><span class="n">findSynonyms</span><span class="p">(</span><span class="s2">&quot;a&quot;</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s2">&quot;word&quot;</span><span class="p">,</span> <span class="n">fmt</span><span class="p">(</span><span class="s2">&quot;similarity&quot;</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span><span class="o">.</span><span class="n">alias</span><span class="p">(</span><span class="s2">&quot;similarity&quot;</span><span class="p">))</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
 <span class="go">+----+----------+</span>
@@ -14396,7 +14396,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 <span id="pyspark-ml-classification-module"></span><h2>pyspark.ml.classification module<a class="headerlink" href="#module-pyspark.ml.classification" title="Permalink to this headline">¶</a></h2>
 <dl class="class">
 <dt id="pyspark.ml.classification.LinearSVC">
-<em class="property">class </em><code class="descclassname">pyspark.ml.classification.</code><code class="descname">LinearSVC</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#LinearSVC"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.LinearSVC" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.classification.</code><code class="descname">LinearSVC</code><span class="sig-paren">(</span><em>featuresCol='features'</em>, <em>labelCol='label'</em>, <em>predictionCol='prediction'</em>, <em>maxIter=100</em>, <em>regParam=0.0</em>, <em>tol=1e-06</em>, <em>rawPredictionCol='rawPrediction'</em>, <em>fitIntercept=True</em>, <em>standardization=True</em>, <em>threshold=0.0</em>, <em>weightCol=None</em>, <em>aggregationDepth=2</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#LinearSVC"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.LinearSVC" title="Permalink to this definition">¶</a></dt>
 <dd><div class="admonition note">
 <p class="first admonition-title">Note</p>
 <p class="last">Experimental</p>
@@ -14771,7 +14771,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="method">
 <dt id="pyspark.ml.classification.LinearSVC.setParams">
-<code class="descname">setParams</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#LinearSVC.setParams"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.LinearSVC.setParams" title="Permalink to this definition">¶</a></dt>
+<code class="descname">setParams</code><span class="sig-paren">(</span><em>featuresCol='features'</em>, <em>labelCol='label'</em>, <em>predictionCol='prediction'</em>, <em>maxIter=100</em>, <em>regParam=0.0</em>, <em>tol=1e-06</em>, <em>rawPredictionCol='rawPrediction'</em>, <em>fitIntercept=True</em>, <em>standardization=True</em>, <em>threshold=0.0</em>, <em>weightCol=None</em>, <em>aggregationDepth=2</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#LinearSVC.setParams"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.LinearSVC.setParams" title="Permalink to this definition">¶</a></dt>
 <dd><p>setParams(self, featuresCol=”features”, labelCol=”label”, predictionCol=”prediction”,                   maxIter=100, regParam=0.0, tol=1e-6, rawPredictionCol=”rawPrediction”,                   fitIntercept=True, standardization=True, threshold=0.0, weightCol=None,                   aggregationDepth=2):
 Sets params for Linear SVM Classifier.</p>
 <div class="versionadded">
@@ -15055,7 +15055,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.classification.LogisticRegression">
-<em class="property">class </em><code class="descclassname">pyspark.ml.classification.</code><code class="descname">LogisticRegression</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#LogisticRegression"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.LogisticRegression" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.classification.</code><code class="descname">LogisticRegression</code><span class="sig-paren">(</span><em>featuresCol='features'</em>, <em>labelCol='label'</em>, <em>predictionCol='prediction'</em>, <em>maxIter=100</em>, <em>regParam=0.0</em>, <em>elasticNetParam=0.0</em>, <em>tol=1e-06</em>, <em>fitIntercept=True</em>, <em>threshold=0.5</em>, <em>thresholds=None</em>, <em>probabilityCol='probability'</em>, <em>rawPredictionCol='rawPrediction'</em>, <em>standardization=True</em>, <em>weightCol=None</em>, <em>aggregationDepth=2</em>, <em>family='auto'</em>, <em>lowerBoundsOnCoefficients=None</em>, <em>upperBoundsOnCoefficients=None</em>, <em>lowerBoundsOnIntercepts=None</em>, <em>upperBoundsOnIntercepts=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#LogisticRegression"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#p
 yspark.ml.classification.LogisticRegression" title="Permalink to this definition">¶</a></dt>
 <dd><p>Logistic regression.
 This class supports multinomial logistic (softmax) and binomial logistic regression.</p>
 <div class="highlight-default"><div class="highlight"><pre><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">from</span> <span class="nn">pyspark.sql</span> <span class="k">import</span> <span class="n">Row</span>
@@ -15574,7 +15574,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="method">
 <dt id="pyspark.ml.classification.LogisticRegression.setParams">
-<code class="descname">setParams</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#LogisticRegression.setParams"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.LogisticRegression.setParams" title="Permalink to this definition">¶</a></dt>
+<code class="descname">setParams</code><span class="sig-paren">(</span><em>featuresCol='features'</em>, <em>labelCol='label'</em>, <em>predictionCol='prediction'</em>, <em>maxIter=100</em>, <em>regParam=0.0</em>, <em>elasticNetParam=0.0</em>, <em>tol=1e-06</em>, <em>fitIntercept=True</em>, <em>threshold=0.5</em>, <em>thresholds=None</em>, <em>probabilityCol='probability'</em>, <em>rawPredictionCol='rawPrediction'</em>, <em>standardization=True</em>, <em>weightCol=None</em>, <em>aggregationDepth=2</em>, <em>family='auto'</em>, <em>lowerBoundsOnCoefficients=None</em>, <em>upperBoundsOnCoefficients=None</em>, <em>lowerBoundsOnIntercepts=None</em>, <em>upperBoundsOnIntercepts=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#LogisticRegression.setParams"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.LogisticRegression.setParams" title="Permalink to this definition">
 ¶</a></dt>
 <dd><p>setParams(self, featuresCol=”features”, labelCol=”label”, predictionCol=”prediction”,                   maxIter=100, regParam=0.0, elasticNetParam=0.0, tol=1e-6, fitIntercept=True,                   threshold=0.5, thresholds=None, probabilityCol=”probability”,                   rawPredictionCol=”rawPrediction”, standardization=True, weightCol=None,                   aggregationDepth=2, family=”auto”,                   lowerBoundsOnCoefficients=None, upperBoundsOnCoefficients=None,                   lowerBoundsOnIntercepts=None, upperBoundsOnIntercepts=None):
 Sets params for logistic regression.
 If the threshold and thresholds Params are both set, they must be equivalent.</p>
@@ -16926,7 +16926,7 @@ versions.</p>
 
 <dl class="class">
 <dt id="pyspark.ml.classification.DecisionTreeClassifier">
-<em class="property">class </em><code class="descclassname">pyspark.ml.classification.</code><code class="descname">DecisionTreeClassifier</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#DecisionTreeClassifier"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.DecisionTreeClassifier" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.classification.</code><code class="descname">DecisionTreeClassifier</code><span class="sig-paren">(</span><em>featuresCol='features'</em>, <em>labelCol='label'</em>, <em>predictionCol='prediction'</em>, <em>probabilityCol='probability'</em>, <em>rawPredictionCol='rawPrediction'</em>, <em>maxDepth=5</em>, <em>maxBins=32</em>, <em>minInstancesPerNode=1</em>, <em>minInfoGain=0.0</em>, <em>maxMemoryInMB=256</em>, <em>cacheNodeIds=False</em>, <em>checkpointInterval=10</em>, <em>impurity='gini'</em>, <em>seed=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#DecisionTreeClassifier"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.DecisionTreeClassifier" title="Permalink to this definition">¶</a></dt>
 <dd><p><a class="reference external" href="http://en.wikipedia.org/wiki/Decision_tree_learning">Decision tree</a>
 learning algorithm for classification.
 It supports both binary and multiclass labels, as well as both continuous and categorical
@@ -17670,7 +17670,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.classification.GBTClassifier">
-<em class="property">class </em><code class="descclassname">pyspark.ml.classification.</code><code class="descname">GBTClassifier</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#GBTClassifier"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.GBTClassifier" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.classification.</code><code class="descname">GBTClassifier</code><span class="sig-paren">(</span><em>featuresCol='features'</em>, <em>labelCol='label'</em>, <em>predictionCol='prediction'</em>, <em>maxDepth=5</em>, <em>maxBins=32</em>, <em>minInstancesPerNode=1</em>, <em>minInfoGain=0.0</em>, <em>maxMemoryInMB=256</em>, <em>cacheNodeIds=False</em>, <em>checkpointInterval=10</em>, <em>lossType='logistic'</em>, <em>maxIter=20</em>, <em>stepSize=0.1</em>, <em>seed=None</em>, <em>subsamplingRate=1.0</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#GBTClassifier"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.GBTClassifier" title="Permalink to this definition">¶</a></dt>
 <dd><p><a class="reference external" href="http://en.wikipedia.org/wiki/Gradient_boosting">Gradient-Boosted Trees (GBTs)</a>
 learning algorithm for classification.
 It supports binary labels, as well as both continuous and categorical features.</p>
@@ -18441,7 +18441,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.classification.RandomForestClassifier">
-<em class="property">class </em><code class="descclassname">pyspark.ml.classification.</code><code class="descname">RandomForestClassifier</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#RandomForestClassifier"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.RandomForestClassifier" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.classification.</code><code class="descname">RandomForestClassifier</code><span class="sig-paren">(</span><em>featuresCol='features'</em>, <em>labelCol='label'</em>, <em>predictionCol='prediction'</em>, <em>probabilityCol='probability'</em>, <em>rawPredictionCol='rawPrediction'</em>, <em>maxDepth=5</em>, <em>maxBins=32</em>, <em>minInstancesPerNode=1</em>, <em>minInfoGain=0.0</em>, <em>maxMemoryInMB=256</em>, <em>cacheNodeIds=False</em>, <em>checkpointInterval=10</em>, <em>impurity='gini'</em>, <em>numTrees=20</em>, <em>featureSubsetStrategy='auto'</em>, <em>seed=None</em>, <em>subsamplingRate=1.0</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#RandomForestClassifier"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.RandomForestClassifier" title="Permalink to this definition">¶</a></dt>
 <dd><p><a class="reference external" href="http://en.wikipedia.org/wiki/Random_forest">Random Forest</a>
 learning algorithm for classification.
 It supports both binary and multiclass labels, as well as both continuous and categorical
@@ -19261,7 +19261,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.classification.NaiveBayes">
-<em class="property">class </em><code class="descclassname">pyspark.ml.classification.</code><code class="descname">NaiveBayes</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#NaiveBayes"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.NaiveBayes" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.classification.</code><code class="descname">NaiveBayes</code><span class="sig-paren">(</span><em>featuresCol='features'</em>, <em>labelCol='label'</em>, <em>predictionCol='prediction'</em>, <em>probabilityCol='probability'</em>, <em>rawPredictionCol='rawPrediction'</em>, <em>smoothing=1.0</em>, <em>modelType='multinomial'</em>, <em>thresholds=None</em>, <em>weightCol=None</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#NaiveBayes"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.NaiveBayes" title="Permalink to this definition">¶</a></dt>
 <dd><p>Naive Bayes Classifiers.
 It supports both Multinomial and Bernoulli NB. <a class="reference external" href="http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html">Multinomial NB</a>
 can handle finitely supported discrete data. For example, by converting documents into
@@ -19882,7 +19882,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.classification.MultilayerPerceptronClassifier">
-<em class="property">class </em><code class="descclassname">pyspark.ml.classification.</code><code class="descname">MultilayerPerceptronClassifier</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#MultilayerPerceptronClassifier"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.MultilayerPerceptronClassifier" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class="descclassname">pyspark.ml.classification.</code><code class="descname">MultilayerPerceptronClassifier</code><span class="sig-paren">(</span><em>featuresCol='features'</em>, <em>labelCol='label'</em>, <em>predictionCol='prediction'</em>, <em>maxIter=100</em>, <em>tol=1e-06</em>, <em>seed=None</em>, <em>layers=None</em>, <em>blockSize=128</em>, <em>stepSize=0.03</em>, <em>solver='l-bfgs'</em>, <em>initialWeights=None</em>, <em>probabilityCol='probability'</em>, <em>rawPredictionCol='rawPrediction'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#MultilayerPerceptronClassifier"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.MultilayerPerceptronClassifier" title="Permalink to this definition">¶</a></dt>
 <dd><p>Classifier trainer based on the Multilayer Perceptron.
 Each layer has sigmoid activation function, output layer has softmax.
 Number of inputs has to be equal to the size of feature vectors.
@@ -20305,7 +20305,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="method">
 <dt id="pyspark.ml.classification.MultilayerPerceptronClassifier.setParams">
-<code class="descname">setParams</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#MultilayerPerceptronClassifier.setParams"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.MultilayerPerceptronClassifier.setParams" title="Permalink to this definition">¶</a></dt>
+<code class="descname">setParams</code><span class="sig-paren">(</span><em>featuresCol='features'</em>, <em>labelCol='label'</em>, <em>predictionCol='prediction'</em>, <em>maxIter=100</em>, <em>tol=1e-06</em>, <em>seed=None</em>, <em>layers=None</em>, <em>blockSize=128</em>, <em>stepSize=0.03</em>, <em>solver='l-bfgs'</em>, <em>initialWeights=None</em>, <em>probabilityCol='probability'</em>, <em>rawPredictionCol='rawPrediction'</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#MultilayerPerceptronClassifier.setParams"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.MultilayerPerceptronClassifier.setParams" title="Permalink to this definition">¶</a></dt>
 <dd><p>setParams(self, featuresCol=”features”, labelCol=”label”, predictionCol=”prediction”,                   maxIter=100, tol=1e-6, seed=None, layers=None, blockSize=128, stepSize=0.03,                   solver=”l-bfgs”, initialWeights=None, probabilityCol=”probability”,                   rawPredictionCol=”rawPrediction”):
 Sets params for MultilayerPerceptronClassifier.</p>
 <div class="versionadded">
@@ -20583,7 +20583,7 @@ uses <code class="xref py py-func docutils literal"><span class="pre">dir()</spa
 
 <dl class="class">
 <dt id="pyspark.ml.classification.OneVsRest">
-<em class="property">class </em><code class="descclassname">pyspark.ml.classification.</code><code class="descname">OneVsRest</code><span class="sig-paren">(</span><em>*args</em>, <em>**kwargs</em><span class="sig-paren">)</span><a class="reference internal" href="_modules/pyspark/ml/classification.html#OneVsRest"><span class="viewcode-link">[source]</span></a><a class="headerlink" href="#pyspark.ml.classification.OneVsRest" title="Permalink to this definition">¶</a></dt>
+<em class="property">class </em><code class

<TRUNCATED>

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org