You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@madlib.apache.org by xt...@apache.org on 2016/09/20 18:31:57 UTC

[45/51] [partial] incubator-madlib-site git commit: Update doc for 1.9.1 release

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__linreg.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__linreg.html b/docs/latest/group__grp__linreg.html
index 6b205a3..5ca4bc8 100644
--- a/docs/latest/group__grp__linreg.html
+++ b/docs/latest/group__grp__linreg.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -123,7 +123,7 @@ $(document).ready(function(){initNavTree('group__grp__linreg.html','');});
 <li class="level1">
 <a href="#related">Related Topics</a> </li>
 </ul>
-</div><p>Linear regression models a linear relationship of a scalar dependent variable <img class="formulaInl" alt="$ y $" src="form_323.png"/> to one or more explanatory independent variables <img class="formulaInl" alt="$ x $" src="form_178.png"/> to build a model of coefficients.</p>
+</div><p>Linear regression models a linear relationship of a scalar dependent variable <img class="formulaInl" alt="$ y $" src="form_324.png"/> to one or more explanatory independent variables <img class="formulaInl" alt="$ x $" src="form_178.png"/> to build a model of coefficients.</p>
 <p><a class="anchor" id="train"></a></p><dl class="section user"><dt>Training Function</dt><dd></dd></dl>
 <p>The linear regression training function has the following syntax. </p><pre class="syntax">
 linregr_train( source_table,
@@ -154,7 +154,7 @@ linregr_train( source_table,
 <tr>
 <th>p_values </th><td>FLOAT8[]. Vector of the p-values of the coefficients.  </td></tr>
 <tr>
-<th>condition_no </th><td>FLOAT8 array. The condition number of the <img class="formulaInl" alt="$X^{*}X$" src="form_324.png"/> matrix. A high condition number is usually an indication that there may be some numeric instability in the result yielding a less reliable model. A high condition number often results when there is a significant amount of colinearity in the underlying design matrix, in which case other regression techniques, such as elastic net regression, may be more appropriate.  </td></tr>
+<th>condition_no </th><td>FLOAT8 array. The condition number of the <img class="formulaInl" alt="$X^{*}X$" src="form_325.png"/> matrix. A high condition number is usually an indication that there may be some numeric instability in the result yielding a less reliable model. A high condition number often results when there is a significant amount of colinearity in the underlying design matrix, in which case other regression techniques, such as elastic net regression, may be more appropriate.  </td></tr>
 <tr>
 <th>bp_stats </th><td>FLOAT8. The Breush-Pagan statistic of heteroskedacity. Present only if the heteroskedacity argument was set to True when the model was trained.  </td></tr>
 <tr>
@@ -240,14 +240,14 @@ COPY houses FROM STDIN WITH DELIMITER '|';
  15 |  650 |       3 |  1.5 |  65000 | 1450 | 12000
 \.
 </pre></li>
-<li>Train a regression model. First, a single regression for all the data. <pre class="example">
+<li>Train a regression model. First, we generate a single regression for all data. <pre class="example">
 SELECT madlib.linregr_train( 'houses',
                              'houses_linregr',
                              'price',
                              'ARRAY[1, tax, bath, size]'
                            );
-</pre></li>
-<li>Generate three output models, one for each value of "bedroom". <pre class="example">
+</pre> (Note that in this example we are dynamically creating the array of independent variables from column names. If you have large numbers of independent variables beyond the PostgreSQL limit of maximum columns per table, you would pre-build the arrays and store them in a single column.)</li>
+<li>Next we generate three output models, one for each value of "bedroom". <pre class="example">
 SELECT madlib.linregr_train( 'houses',
                              'houses_linregr_bedroom',
                              'price',
@@ -320,43 +320,43 @@ FROM houses, houses_linregr m;
 <p><a class="anchor" id="notes"></a></p><dl class="section user"><dt>Note</dt><dd>All table names can be optionally schema qualified (current_schemas() would be searched if a schema name is not provided) and all table and column names should follow case-sensitivity and quoting rules per the database. (For instance, 'mytable' and 'MyTable' both resolve to the same entity, i.e. 'mytable'. If mixed-case or multi-byte characters are desired for entity names then the string should be double-quoted; in this case the input would be '"MyTable"').</dd></dl>
 <p><a class="anchor" id="background"></a></p><dl class="section user"><dt>Technical Background</dt><dd></dd></dl>
 <p>Ordinary least-squares (OLS) linear regression refers to a stochastic model in which the conditional mean of the dependent variable (usually denoted <img class="formulaInl" alt="$ Y $" src="form_3.png"/>) is an affine function of the vector of independent variables (usually denoted <img class="formulaInl" alt="$ \boldsymbol x $" src="form_58.png"/>). That is, </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ E[Y \mid \boldsymbol x] = \boldsymbol c^T \boldsymbol x \]" src="form_325.png"/>
+<img class="formulaDsp" alt="\[ E[Y \mid \boldsymbol x] = \boldsymbol c^T \boldsymbol x \]" src="form_326.png"/>
 </p>
 <p> for some unknown vector of coefficients <img class="formulaInl" alt="$ \boldsymbol c $" src="form_78.png"/>. The assumption is that the residuals are i.i.d. distributed Gaussians. That is, the (conditional) probability density of <img class="formulaInl" alt="$ Y $" src="form_3.png"/> is given by </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ f(y \mid \boldsymbol x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \cdot \exp\left(-\frac{1}{2 \sigma^2} \cdot (y - \boldsymbol x^T \boldsymbol c)^2 \right) \,. \]" src="form_326.png"/>
+<img class="formulaDsp" alt="\[ f(y \mid \boldsymbol x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \cdot \exp\left(-\frac{1}{2 \sigma^2} \cdot (y - \boldsymbol x^T \boldsymbol c)^2 \right) \,. \]" src="form_327.png"/>
 </p>
 <p> OLS linear regression finds the vector of coefficients <img class="formulaInl" alt="$ \boldsymbol c $" src="form_78.png"/> that maximizes the likelihood of the observations.</p>
 <p>Let</p><ul>
-<li><img class="formulaInl" alt="$ \boldsymbol y \in \mathbf R^n $" src="form_327.png"/> denote the vector of observed dependent variables, with <img class="formulaInl" alt="$ n $" src="form_10.png"/> rows, containing the observed values of the dependent variable,</li>
+<li><img class="formulaInl" alt="$ \boldsymbol y \in \mathbf R^n $" src="form_328.png"/> denote the vector of observed dependent variables, with <img class="formulaInl" alt="$ n $" src="form_10.png"/> rows, containing the observed values of the dependent variable,</li>
 <li><img class="formulaInl" alt="$ X \in \mathbf R^{n \times k} $" src="form_98.png"/> denote the design matrix with <img class="formulaInl" alt="$ k $" src="form_97.png"/> columns and <img class="formulaInl" alt="$ n $" src="form_10.png"/> rows, containing all observed vectors of independent variables. <img class="formulaInl" alt="$ \boldsymbol x_i $" src="form_99.png"/> as rows,</li>
-<li><img class="formulaInl" alt="$ X^T $" src="form_328.png"/> denote the transpose of <img class="formulaInl" alt="$ X $" src="form_2.png"/>,</li>
-<li><img class="formulaInl" alt="$ X^+ $" src="form_329.png"/> denote the pseudo-inverse of <img class="formulaInl" alt="$ X $" src="form_2.png"/>.</li>
+<li><img class="formulaInl" alt="$ X^T $" src="form_329.png"/> denote the transpose of <img class="formulaInl" alt="$ X $" src="form_2.png"/>,</li>
+<li><img class="formulaInl" alt="$ X^+ $" src="form_330.png"/> denote the pseudo-inverse of <img class="formulaInl" alt="$ X $" src="form_2.png"/>.</li>
 </ul>
-<p>Maximizing the likelihood is equivalent to maximizing the log-likelihood <img class="formulaInl" alt="$ \sum_{i=1}^n \log f(y_i \mid \boldsymbol x_i) $" src="form_330.png"/>, which simplifies to minimizing the <b>residual sum of squares</b> <img class="formulaInl" alt="$ RSS $" src="form_331.png"/> (also called sum of squared residuals or sum of squared errors of prediction), </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ RSS = \sum_{i=1}^n ( y_i - \boldsymbol c^T \boldsymbol x_i )^2 = (\boldsymbol y - X \boldsymbol c)^T (\boldsymbol y - X \boldsymbol c) \,. \]" src="form_332.png"/>
+<p>Maximizing the likelihood is equivalent to maximizing the log-likelihood <img class="formulaInl" alt="$ \sum_{i=1}^n \log f(y_i \mid \boldsymbol x_i) $" src="form_331.png"/>, which simplifies to minimizing the <b>residual sum of squares</b> <img class="formulaInl" alt="$ RSS $" src="form_332.png"/> (also called sum of squared residuals or sum of squared errors of prediction), </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ RSS = \sum_{i=1}^n ( y_i - \boldsymbol c^T \boldsymbol x_i )^2 = (\boldsymbol y - X \boldsymbol c)^T (\boldsymbol y - X \boldsymbol c) \,. \]" src="form_333.png"/>
 </p>
-<p> The first-order conditions yield that the <img class="formulaInl" alt="$ RSS $" src="form_331.png"/> is minimized at </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ \boldsymbol c = (X^T X)^+ X^T \boldsymbol y \,. \]" src="form_333.png"/>
+<p> The first-order conditions yield that the <img class="formulaInl" alt="$ RSS $" src="form_332.png"/> is minimized at </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ \boldsymbol c = (X^T X)^+ X^T \boldsymbol y \,. \]" src="form_334.png"/>
 </p>
-<p>Computing the <b>total sum of squares</b> <img class="formulaInl" alt="$ TSS $" src="form_334.png"/>, the <b>explained sum of squares</b> <img class="formulaInl" alt="$ ESS $" src="form_335.png"/> (also called the regression sum of squares), and the <b>coefficient of determination</b> <img class="formulaInl" alt="$ R^2 $" src="form_336.png"/> is done according to the following formulas: </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\begin{align*} ESS &amp; = \boldsymbol y^T X \boldsymbol c - \frac{ \| y \|_1^2 }{n} \\ TSS &amp; = \sum_{i=1}^n y_i^2 - \frac{ \| y \|_1^2 }{n} \\ R^2 &amp; = \frac{ESS}{TSS} \end{align*}" src="form_337.png"/>
+<p>Computing the <b>total sum of squares</b> <img class="formulaInl" alt="$ TSS $" src="form_335.png"/>, the <b>explained sum of squares</b> <img class="formulaInl" alt="$ ESS $" src="form_336.png"/> (also called the regression sum of squares), and the <b>coefficient of determination</b> <img class="formulaInl" alt="$ R^2 $" src="form_337.png"/> is done according to the following formulas: </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\begin{align*} ESS &amp; = \boldsymbol y^T X \boldsymbol c - \frac{ \| y \|_1^2 }{n} \\ TSS &amp; = \sum_{i=1}^n y_i^2 - \frac{ \| y \|_1^2 }{n} \\ R^2 &amp; = \frac{ESS}{TSS} \end{align*}" src="form_338.png"/>
 </p>
-<p> Note: The last equality follows from the definition <img class="formulaInl" alt="$ R^2 = 1 - \frac{RSS}{TSS} $" src="form_338.png"/> and the fact that for linear regression <img class="formulaInl" alt="$ TSS = RSS + ESS $" src="form_339.png"/>. A proof of the latter can be found, e.g., at: <a href="http://en.wikipedia.org/wiki/Sum_of_squares">http://en.wikipedia.org/wiki/Sum_of_squares</a></p>
-<p>We estimate the variance <img class="formulaInl" alt="$ Var[Y - \boldsymbol c^T \boldsymbol x \mid \boldsymbol x] $" src="form_340.png"/> as </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ \sigma^2 = \frac{RSS}{n - k} \]" src="form_341.png"/>
+<p> Note: The last equality follows from the definition <img class="formulaInl" alt="$ R^2 = 1 - \frac{RSS}{TSS} $" src="form_339.png"/> and the fact that for linear regression <img class="formulaInl" alt="$ TSS = RSS + ESS $" src="form_340.png"/>. A proof of the latter can be found, e.g., at: <a href="http://en.wikipedia.org/wiki/Sum_of_squares">http://en.wikipedia.org/wiki/Sum_of_squares</a></p>
+<p>We estimate the variance <img class="formulaInl" alt="$ Var[Y - \boldsymbol c^T \boldsymbol x \mid \boldsymbol x] $" src="form_341.png"/> as </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ \sigma^2 = \frac{RSS}{n - k} \]" src="form_342.png"/>
 </p>
 <p> and compute the t-statistic for coefficient <img class="formulaInl" alt="$ i $" src="form_32.png"/> as </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ t_i = \frac{c_i}{\sqrt{\sigma^2 \cdot \left( (X^T X)^{-1} \right)_{ii} }} \,. \]" src="form_342.png"/>
+<img class="formulaDsp" alt="\[ t_i = \frac{c_i}{\sqrt{\sigma^2 \cdot \left( (X^T X)^{-1} \right)_{ii} }} \,. \]" src="form_343.png"/>
 </p>
-<p>The <img class="formulaInl" alt="$ p $" src="form_110.png"/>-value for coefficient <img class="formulaInl" alt="$ i $" src="form_32.png"/> gives the probability of seeing a value at least as extreme as the one observed, provided that the null hypothesis ( <img class="formulaInl" alt="$ c_i = 0 $" src="form_111.png"/>) is true. Letting <img class="formulaInl" alt="$ F_\nu $" src="form_343.png"/> denote the cumulative density function of student-t with <img class="formulaInl" alt="$ \nu $" src="form_273.png"/> degrees of freedom, the <img class="formulaInl" alt="$ p $" src="form_110.png"/>-value for coefficient <img class="formulaInl" alt="$ i $" src="form_32.png"/> is therefore </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ p_i = \Pr(|T| \geq |t_i|) = 2 \cdot (1 - F_{n - k}( |t_i| )) \]" src="form_344.png"/>
+<p>The <img class="formulaInl" alt="$ p $" src="form_110.png"/>-value for coefficient <img class="formulaInl" alt="$ i $" src="form_32.png"/> gives the probability of seeing a value at least as extreme as the one observed, provided that the null hypothesis ( <img class="formulaInl" alt="$ c_i = 0 $" src="form_111.png"/>) is true. Letting <img class="formulaInl" alt="$ F_\nu $" src="form_344.png"/> denote the cumulative density function of student-t with <img class="formulaInl" alt="$ \nu $" src="form_274.png"/> degrees of freedom, the <img class="formulaInl" alt="$ p $" src="form_110.png"/>-value for coefficient <img class="formulaInl" alt="$ i $" src="form_32.png"/> is therefore </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ p_i = \Pr(|T| \geq |t_i|) = 2 \cdot (1 - F_{n - k}( |t_i| )) \]" src="form_345.png"/>
 </p>
-<p> where <img class="formulaInl" alt="$ T $" src="form_303.png"/> is a student-t distributed random variable with mean 0.</p>
-<p>The condition number [2] <img class="formulaInl" alt="$ \kappa(X) = \|X\|_2\cdot\|X^{-1}\|_2$" src="form_345.png"/> is computed as the product of two spectral norms [3]. The spectral norm of a matrix <img class="formulaInl" alt="$X$" src="form_346.png"/> is the largest singular value of <img class="formulaInl" alt="$X$" src="form_346.png"/> i.e. the square root of the largest eigenvalue of the positive-semidefinite matrix <img class="formulaInl" alt="$X^{*}X$" src="form_324.png"/>:</p>
+<p> where <img class="formulaInl" alt="$ T $" src="form_304.png"/> is a student-t distributed random variable with mean 0.</p>
+<p>The condition number [2] <img class="formulaInl" alt="$ \kappa(X) = \|X\|_2\cdot\|X^{-1}\|_2$" src="form_346.png"/> is computed as the product of two spectral norms [3]. The spectral norm of a matrix <img class="formulaInl" alt="$X$" src="form_347.png"/> is the largest singular value of <img class="formulaInl" alt="$X$" src="form_347.png"/> i.e. the square root of the largest eigenvalue of the positive-semidefinite matrix <img class="formulaInl" alt="$X^{*}X$" src="form_325.png"/>:</p>
 <p class="formulaDsp">
-<img class="formulaDsp" alt="\[ \|X\|_2 = \sqrt{\lambda_{\max}\left(X^{*}X\right)}\ , \]" src="form_347.png"/>
+<img class="formulaDsp" alt="\[ \|X\|_2 = \sqrt{\lambda_{\max}\left(X^{*}X\right)}\ , \]" src="form_348.png"/>
 </p>
-<p> where <img class="formulaInl" alt="$X^{*}$" src="form_348.png"/> is the conjugate transpose of <img class="formulaInl" alt="$X$" src="form_346.png"/>. The condition number of a linear regression problem is a worst-case measure of how sensitive the result is to small perturbations of the input. A large condition number (say, more than 1000) indicates the presence of significant multicollinearity.</p>
+<p> where <img class="formulaInl" alt="$X^{*}$" src="form_349.png"/> is the conjugate transpose of <img class="formulaInl" alt="$X$" src="form_347.png"/>. The condition number of a linear regression problem is a worst-case measure of how sensitive the result is to small perturbations of the input. A large condition number (say, more than 1000) indicates the presence of significant multicollinearity.</p>
 <p><a class="anchor" id="literature"></a></p><dl class="section user"><dt>Literature</dt><dd></dd></dl>
 <p>[1] Cosma Shalizi: Statistics 36-350: Data Mining, Lecture Notes, 21 October 2009, <a href="http://www.stat.cmu.edu/~cshalizi/350/lectures/17/lecture-17.pdf">http://www.stat.cmu.edu/~cshalizi/350/lectures/17/lecture-17.pdf</a></p>
 <p>[2] Wikipedia: Condition Number, <a href="http://en.wikipedia.org/wiki/Condition_number">http://en.wikipedia.org/wiki/Condition_number</a>.</p>
@@ -373,7 +373,7 @@ FROM houses, houses_linregr m;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__lmf.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__lmf.html b/docs/latest/group__grp__lmf.html
index b21f530..0a3f03f 100644
--- a/docs/latest/group__grp__lmf.html
+++ b/docs/latest/group__grp__lmf.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -263,7 +263,7 @@ WHERE id = 1;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__logreg.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__logreg.html b/docs/latest/group__grp__logreg.html
index 27b27cb..fe349dc 100644
--- a/docs/latest/group__grp__logreg.html
+++ b/docs/latest/group__grp__logreg.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -173,7 +173,7 @@ logregr_train( source_table,
 <p class="endtd"></p>
 </td></tr>
 <tr>
-<th>condition_no </th><td><p class="starttd">FLOAT8[]. The condition number of the <img class="formulaInl" alt="$X^{*}X$" src="form_324.png"/> matrix. A high condition number is usually an indication that there may be some numeric instability in the result yielding a less reliable model. A high condition number often results when there is a significant amount of colinearity in the underlying design matrix, in which case other regression techniques may be more appropriate. </p>
+<th>condition_no </th><td><p class="starttd">FLOAT8[]. The condition number of the <img class="formulaInl" alt="$X^{*}X$" src="form_325.png"/> matrix. A high condition number is usually an indication that there may be some numeric instability in the result yielding a less reliable model. A high condition number often results when there is a significant amount of colinearity in the underlying design matrix, in which case other regression techniques may be more appropriate. </p>
 <p class="endtd"></p>
 </td></tr>
 <tr>
@@ -312,7 +312,7 @@ SELECT madlib.logregr_train( 'patients',
                              20,
                              'irls'
                            );
-</pre></li>
+</pre> (Note that in this example we are dynamically creating the array of independent variables from column names. If you have large numbers of independent variables beyond the PostgreSQL limit of maximum columns per table, you would pre-build the arrays and store them in a single column.)</li>
 <li>View the regression results. <pre class="example">
 -- Set extended display on for easier reading of output
 \x on
@@ -356,19 +356,19 @@ ORDER BY p.id;
 </dd></dl>
 <p><a class="anchor" id="notes"></a></p><dl class="section user"><dt>Notes</dt><dd>All table names can be optionally schema qualified (current_schemas() would be searched if a schema name is not provided) and all table and column names should follow case-sensitivity and quoting rules per the database. (For instance, 'mytable' and 'MyTable' both resolve to the same entity, i.e. 'mytable'. If mixed-case or multi-byte characters are desired for entity names then the string should be double-quoted; in this case the input would be '"MyTable"').</dd></dl>
 <p><a class="anchor" id="background"></a></p><dl class="section user"><dt>Technical Background</dt><dd></dd></dl>
-<p>(Binomial) logistic regression refers to a stochastic model in which the conditional mean of the dependent dichotomous variable (usually denoted <img class="formulaInl" alt="$ Y \in \{ 0,1 \} $" src="form_353.png"/>) is the logistic function of an affine function of the vector of independent variables (usually denoted <img class="formulaInl" alt="$ \boldsymbol x $" src="form_58.png"/>). That is, </p><p class="formulaDsp">
+<p>(Binomial) logistic regression refers to a stochastic model in which the conditional mean of the dependent dichotomous variable (usually denoted <img class="formulaInl" alt="$ Y \in \{ 0,1 \} $" src="form_354.png"/>) is the logistic function of an affine function of the vector of independent variables (usually denoted <img class="formulaInl" alt="$ \boldsymbol x $" src="form_58.png"/>). That is, </p><p class="formulaDsp">
 <img class="formulaDsp" alt="\[ E[Y \mid \boldsymbol x] = \sigma(\boldsymbol c^T \boldsymbol x) \]" src="form_94.png"/>
 </p>
 <p> for some unknown vector of coefficients <img class="formulaInl" alt="$ \boldsymbol c $" src="form_78.png"/> and where <img class="formulaInl" alt="$ \sigma(x) = \frac{1}{1 + \exp(-x)} $" src="form_95.png"/> is the logistic function. Logistic regression finds the vector of coefficients <img class="formulaInl" alt="$ \boldsymbol c $" src="form_78.png"/> that maximizes the likelihood of the observations.</p>
 <p>Let</p><ul>
-<li><img class="formulaInl" alt="$ \boldsymbol y \in \{ 0,1 \}^n $" src="form_354.png"/> denote the vector of observed dependent variables, with <img class="formulaInl" alt="$ n $" src="form_10.png"/> rows, containing the observed values of the dependent variable,</li>
+<li><img class="formulaInl" alt="$ \boldsymbol y \in \{ 0,1 \}^n $" src="form_355.png"/> denote the vector of observed dependent variables, with <img class="formulaInl" alt="$ n $" src="form_10.png"/> rows, containing the observed values of the dependent variable,</li>
 <li><img class="formulaInl" alt="$ X \in \mathbf R^{n \times k} $" src="form_98.png"/> denote the design matrix with <img class="formulaInl" alt="$ k $" src="form_97.png"/> columns and <img class="formulaInl" alt="$ n $" src="form_10.png"/> rows, containing all observed vectors of independent variables <img class="formulaInl" alt="$ \boldsymbol x_i $" src="form_99.png"/> as rows.</li>
 </ul>
 <p>By definition, </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ P[Y = y_i | \boldsymbol x_i] = \sigma((-1)^{(1 - y_i)} \cdot \boldsymbol c^T \boldsymbol x_i) \,. \]" src="form_355.png"/>
+<img class="formulaDsp" alt="\[ P[Y = y_i | \boldsymbol x_i] = \sigma((-1)^{(1 - y_i)} \cdot \boldsymbol c^T \boldsymbol x_i) \,. \]" src="form_356.png"/>
 </p>
 <p> Maximizing the likelihood <img class="formulaInl" alt="$ \prod_{i=1}^n \Pr(Y = y_i \mid \boldsymbol x_i) $" src="form_101.png"/> is equivalent to maximizing the log-likelihood <img class="formulaInl" alt="$ \sum_{i=1}^n \log \Pr(Y = y_i \mid \boldsymbol x_i) $" src="form_102.png"/>, which simplifies to </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ l(\boldsymbol c) = -\sum_{i=1}^n \log(1 + \exp((-1)^{(1 - y_i)} \cdot \boldsymbol c^T \boldsymbol x_i)) \,. \]" src="form_356.png"/>
+<img class="formulaDsp" alt="\[ l(\boldsymbol c) = -\sum_{i=1}^n \log(1 + \exp((-1)^{(1 - y_i)} \cdot \boldsymbol c^T \boldsymbol x_i)) \,. \]" src="form_357.png"/>
 </p>
 <p> The Hessian of this objective is <img class="formulaInl" alt="$ H = -X^T A X $" src="form_104.png"/> where <img class="formulaInl" alt="$ A = \text{diag}(a_1, \dots, a_n) $" src="form_105.png"/> is the diagonal matrix with <img class="formulaInl" alt="$ a_i = \sigma(\boldsymbol c^T \boldsymbol x) \cdot \sigma(-\boldsymbol c^T \boldsymbol x) \,. $" src="form_106.png"/> Since <img class="formulaInl" alt="$ H $" src="form_107.png"/> is non-positive definite, <img class="formulaInl" alt="$ l(\boldsymbol c) $" src="form_79.png"/> is convex. There are many techniques for solving convex optimization problems. Currently, logistic regression in MADlib can use one of three algorithms:</p><ul>
 <li>Iteratively Reweighted Least Squares</li>
@@ -410,7 +410,7 @@ ORDER BY p.id;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__marginal.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__marginal.html b/docs/latest/group__grp__marginal.html
index d43a334..26aae59 100644
--- a/docs/latest/group__grp__marginal.html
+++ b/docs/latest/group__grp__marginal.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -123,7 +123,7 @@ $(document).ready(function(){initNavTree('group__grp__marginal.html','');});
 <li>
 <a href="#related">Related Topics</a> </li>
 </ul>
-</div><p>A marginal effect (ME) or partial effect measures the effect on the conditional mean of <img class="formulaInl" alt="$ y $" src="form_323.png"/> for a change in one of the regressors, say <img class="formulaInl" alt="$X_k$" src="form_366.png"/>. In the linear regression model, the ME equals the relevant slope coefficient, greatly simplifying analysis. For nonlinear models, specialized algorithms are required for calculating ME. The marginal effect computed is the average of the marginal effect at every data point present in the source table.</p>
+</div><p>A marginal effect (ME) or partial effect measures the effect on the conditional mean of <img class="formulaInl" alt="$ y $" src="form_324.png"/> for a change in one of the regressors, say <img class="formulaInl" alt="$X_k$" src="form_367.png"/>. In the linear regression model, the ME equals the relevant slope coefficient, greatly simplifying analysis. For nonlinear models, specialized algorithms are required for calculating ME. The marginal effect computed is the average of the marginal effect at every data point present in the source table.</p>
 <p>MADlib provides marginal effects regression functions for linear, logistic and multinomial logistic regressions.</p>
 <dl class="section warning"><dt>Warning</dt><dd>The <a class="el" href="marginal_8sql__in.html#a9517d679ee4209126895445cbed51fe3">margins_logregr()</a> and <a class="el" href="marginal_8sql__in.html#ae39ad0e1beca060fd153dba35901a4e7">margins_mlogregr()</a> functions have been deprecated in favor of the <a class="el" href="marginal_8sql__in.html#a36fcae5245ca31517723fce38b183c90" title="Marginal effects with default variable_names. ">margins()</a> function.</dd></dl>
 <p><a class="anchor" id="margins"></a></p><dl class="section user"><dt>Marginal Effects with Interaction Terms</dt><dd><pre class="syntax">
@@ -398,16 +398,16 @@ p_values     | {0.00729989838349161,0.181668346802398,8.89828265128986e-17}
 </ol>
 <p><a class="anchor" id="notes"></a> </p><dl class="section note"><dt>Note</dt><dd>The <em>marginal_vars</em> argument is a list with the names matching those in 'x_design'. If no 'x_design' is present (i.e. no interaction and no indicator variables), then <em>marginal_vars</em> must be the indices (base 1) of variables in 'independent_varname'. Use <em>NULL</em> to use all independent variables. It is important to note that the <em>independent_varname</em> array in the underlying regression is assumed to start with a lower bound index of 1. Arrays that don't follow this would result in an incorrect solution.</dd></dl>
 <p><a class="anchor" id="background"></a></p><dl class="section user"><dt>Technical Background</dt><dd></dd></dl>
-<p>The standard approach to modeling dichotomous/binary variables (so <img class="formulaInl" alt="$y \in \{0, 1\} $" src="form_367.png"/>) is to estimate a generalized linear model under the assumption that <img class="formulaInl" alt="$ y $" src="form_323.png"/> follows some form of Bernoulli distribution. Thus the expected value of <img class="formulaInl" alt="$ y $" src="form_323.png"/> becomes, </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ y = G(X' \beta), \]" src="form_368.png"/>
+<p>The standard approach to modeling dichotomous/binary variables (so <img class="formulaInl" alt="$y \in \{0, 1\} $" src="form_368.png"/>) is to estimate a generalized linear model under the assumption that <img class="formulaInl" alt="$ y $" src="form_324.png"/> follows some form of Bernoulli distribution. Thus the expected value of <img class="formulaInl" alt="$ y $" src="form_324.png"/> becomes, </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ y = G(X' \beta), \]" src="form_369.png"/>
 </p>
-<p>where G is the specified binomial distribution. For logistic regression, the function <img class="formulaInl" alt="$ G $" src="form_369.png"/> represents the inverse logit function.</p>
+<p>where G is the specified binomial distribution. For logistic regression, the function <img class="formulaInl" alt="$ G $" src="form_370.png"/> represents the inverse logit function.</p>
 <p>In logistic regression: </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ P = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \dots \beta_j x_j)}} = \frac{1}{1 + e^{-z}} \implies \frac{\partial P}{\partial X_k} = \beta_k \cdot \frac{1}{1 + e^{-z}} \cdot \frac{e^{-z}}{1 + e^{-z}} \\ = \beta_k \cdot P \cdot (1-P) \]" src="form_370.png"/>
+<img class="formulaDsp" alt="\[ P = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + \dots \beta_j x_j)}} = \frac{1}{1 + e^{-z}} \implies \frac{\partial P}{\partial X_k} = \beta_k \cdot \frac{1}{1 + e^{-z}} \cdot \frac{e^{-z}}{1 + e^{-z}} \\ = \beta_k \cdot P \cdot (1-P) \]" src="form_371.png"/>
 </p>
 <p>There are several methods for calculating the marginal effects for dichotomous dependent variables. This package uses the average of the marginal effects at every sample observation.</p>
 <p>This is calculated as follows: </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ \frac{\partial y}{\partial x_k} = \beta_k \frac{\sum_{i=1}^n P(y_i = 1)(1-P(y_i = 1))}{n}, \\ \text{where}, P(y_i=1) = g(X^{(i)}\beta) \]" src="form_371.png"/>
+<img class="formulaDsp" alt="\[ \frac{\partial y}{\partial x_k} = \beta_k \frac{\sum_{i=1}^n P(y_i = 1)(1-P(y_i = 1))}{n}, \\ \text{where}, P(y_i=1) = g(X^{(i)}\beta) \]" src="form_372.png"/>
 </p>
 <p>We use the delta method for calculating standard errors on the marginal effects.</p>
 <p><a class="anchor" id="literature"></a></p><dl class="section user"><dt>Literature</dt><dd></dd></dl>
@@ -419,7 +419,7 @@ p_values     | {0.00729989838349161,0.181668346802398,8.89828265128986e-17}
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__matrix.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__matrix.html b/docs/latest/group__grp__matrix.html
index d2814d7..fe4148c 100644
--- a/docs/latest/group__grp__matrix.html
+++ b/docs/latest/group__grp__matrix.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -287,16 +287,16 @@ $(document).ready(function(){initNavTree('group__grp__matrix.html','');});
 <dt>matrix_out </dt>
 <dd><p class="startdd">TEXT. Name of the table to store the result matrix.</p>
 <p>For Cholesky, QR and LU decompositions, a prefix (<em>matrix_out_prefix</em>) is used as a basis to build the names of the various output tables.</p>
-<p>For Cholesky decomposition ( <img class="formulaInl" alt="$ PA = LDL* $" src="form_545.png"/>), the following suffixes are added to <em>matrix_out_prefix</em>:</p><ul>
+<p>For Cholesky decomposition ( <img class="formulaInl" alt="$ PA = LDL* $" src="form_189.png"/>), the following suffixes are added to <em>matrix_out_prefix</em>:</p><ul>
 <li><em>_p</em> for row permutation matrix P</li>
 <li><em>_l</em> for lower triangular factor L</li>
 <li><em>_d</em> for diagonal matrix D</li>
 </ul>
-<p>For QR decomposition ( <img class="formulaInl" alt="$ A = QR $" src="form_189.png"/>) the following suffixes are added to <em>matrix_out_prefix</em>:</p><ul>
+<p>For QR decomposition ( <img class="formulaInl" alt="$ A = QR $" src="form_190.png"/>) the following suffixes are added to <em>matrix_out_prefix</em>:</p><ul>
 <li><em>_q</em> for orthogonal matrix Q</li>
 <li><em>_r</em> for upper triangular factor R</li>
 </ul>
-<p>For LU decomposition with full pivoting ( <img class="formulaInl" alt="$ PAQ = LU $" src="form_190.png"/>), the following suffixes are added to <em>matrix_out_prefix</em>:</p><ul>
+<p>For LU decomposition with full pivoting ( <img class="formulaInl" alt="$ PAQ = LU $" src="form_191.png"/>), the following suffixes are added to <em>matrix_out_prefix</em>:</p><ul>
 <li><em>_p</em> for row permutation matrix P</li>
 <li><em>_q</em> for column permutation matrix Q</li>
 <li><em>_l</em> for lower triangular factor L</li>
@@ -873,7 +873,7 @@ SELECT madlib.matrix_norm('"mat_A_sparse"', 'row="rowNum", col=col_num, val=entr
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__matrix__factorization.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__matrix__factorization.html b/docs/latest/group__grp__matrix__factorization.html
index a838c4d..2759dd8 100644
--- a/docs/latest/group__grp__matrix__factorization.html
+++ b/docs/latest/group__grp__matrix__factorization.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -127,7 +127,7 @@ Modules</h2></td></tr>
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__mdl.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__mdl.html b/docs/latest/group__grp__mdl.html
index d52ca5d..f9f6d31 100644
--- a/docs/latest/group__grp__mdl.html
+++ b/docs/latest/group__grp__mdl.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -112,11 +112,15 @@ $(document).ready(function(){initNavTree('group__grp__mdl.html','');});
 </div><!--header-->
 <div class="contents">
 <a name="details" id="details"></a><h2 class="groupheader">Detailed Description</h2>
-<p>Contains the cross-validation module, a collection of routines useful for <a href="http://en.wikipedia.org/wiki/Cross-validation_(statistics)">Cross-validation</a>. </p>
+<p>Contains functions for evaluating accuracy and validation of predictive methods. </p>
 <table class="memberdecls">
 <tr class="heading"><td colspan="2"><h2 class="groupheader"><a name="groups"></a>
 Modules</h2></td></tr>
 <tr class="memitem:group__grp__validation"><td class="memItemLeft" align="right" valign="top">&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="group__grp__validation.html">Cross Validation</a></td></tr>
+<tr class="memdesc:group__grp__validation"><td class="mdescLeft">&#160;</td><td class="mdescRight">Estimates the fit of a predictive model given a data set and specifications for the training, prediction, and error estimation functions. <br /></td></tr>
+<tr class="separator:"><td class="memSeparator" colspan="2">&#160;</td></tr>
+<tr class="memitem:group__grp__pred"><td class="memItemLeft" align="right" valign="top">&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="group__grp__pred.html">Prediction Metrics</a></td></tr>
+<tr class="memdesc:group__grp__pred"><td class="mdescLeft">&#160;</td><td class="mdescRight">Provides various prediction accuracy metrics. <br /></td></tr>
 <tr class="separator:"><td class="memSeparator" colspan="2">&#160;</td></tr>
 </table>
 </div><!-- contents -->
@@ -124,7 +128,7 @@ Modules</h2></td></tr>
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__mdl.js
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__mdl.js b/docs/latest/group__grp__mdl.js
index 0dafbec..f2e3c69 100644
--- a/docs/latest/group__grp__mdl.js
+++ b/docs/latest/group__grp__mdl.js
@@ -1,4 +1,5 @@
 var group__grp__mdl =
 [
-    [ "Cross Validation", "group__grp__validation.html", null ]
+    [ "Cross Validation", "group__grp__validation.html", null ],
+    [ "Prediction Metrics", "group__grp__pred.html", null ]
 ];
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__mfvsketch.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__mfvsketch.html b/docs/latest/group__grp__mfvsketch.html
index 2c9a5e6..e26f58a 100644
--- a/docs/latest/group__grp__mfvsketch.html
+++ b/docs/latest/group__grp__mfvsketch.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -162,7 +162,7 @@ FROM data;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:11 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__mlogreg.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__mlogreg.html b/docs/latest/group__grp__mlogreg.html
index 5497bf9..560c951 100644
--- a/docs/latest/group__grp__mlogreg.html
+++ b/docs/latest/group__grp__mlogreg.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -402,7 +402,7 @@ coef                     | {{1.45474045211601,0.0849956182104023,-0.017238349960
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:11 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__multinom.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__multinom.html b/docs/latest/group__grp__multinom.html
index 47d2591..76ef400 100644
--- a/docs/latest/group__grp__multinom.html
+++ b/docs/latest/group__grp__multinom.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -473,7 +473,7 @@ SELECT * FROM test3_prd_prob;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__ordinal.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__ordinal.html b/docs/latest/group__grp__ordinal.html
index cb170d9..786e1e3 100644
--- a/docs/latest/group__grp__ordinal.html
+++ b/docs/latest/group__grp__ordinal.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -456,7 +456,7 @@ SELECT * FROM test3_prd_prob;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__path.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__path.html b/docs/latest/group__grp__path.html
index bb4227f..78e9f56 100644
--- a/docs/latest/group__grp__path.html
+++ b/docs/latest/group__grp__path.html
@@ -6,7 +6,7 @@
 <meta http-equiv="X-UA-Compatible" content="IE=9"/>
 <meta name="generator" content="Doxygen 1.8.10"/>
 <meta name="keywords" content="madlib,postgres,greenplum,machine learning,data mining,deep learning,ensemble methods,data science,market basket analysis,affinity analysis,pca,lda,regression,elastic net,huber white,proportional hazards,k-means,latent dirichlet allocation,bayes,support vector machines,svm"/>
-<title>MADlib: Path Functions</title>
+<title>MADlib: Path</title>
 <link href="tabs.css" rel="stylesheet" type="text/css"/>
 <script type="text/javascript" src="jquery.js"></script>
 <script type="text/javascript" src="dynsections.js"></script>
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -106,7 +106,7 @@ $(document).ready(function(){initNavTree('group__grp__path.html','');});
 
 <div class="header">
   <div class="headertitle">
-<div class="title">Path Functions<div class="ingroups"><a class="el" href="group__grp__utility__functions.html">Utility Functions</a></div></div>  </div>
+<div class="title">Path<div class="ingroups"><a class="el" href="group__grp__utility__functions.html">Utility Functions</a></div></div>  </div>
 </div><!--header-->
 <div class="contents">
 <div class="toc"><b>Contents</b> </p><ul>
@@ -140,7 +140,8 @@ path(
     symbol,
     pattern,
     aggregate_func,
-    persist_rows
+    persist_rows,
+    overlapping_patterns
 )
 </pre></dd></dl>
 <p><b>Arguments</b> </p><dl class="arglist">
@@ -153,7 +154,7 @@ path(
 <p class="enddd"></p>
 </dd>
 <dt>partition_expr </dt>
-<dd><p class="startdd">VARCHAR. The 'partition_expr' can be a single column or a list of comma-separated columns/expressions to divide all rows into groups, or partitions. Matching is applied across the rows that fall into t he same partition. This can be NULL or '' to indicate the matching is to be applied to the whole table.</p>
+<dd><p class="startdd">VARCHAR. The 'partition_expr' can be a single column or a list of comma-separated columns/expressions to divide all rows into groups, or partitions. Matching is applied across the rows that fall into the same partition. This can be NULL or '' to indicate the matching is to be applied to the whole table.</p>
 <p class="enddd"></p>
 </dd>
 <dt>order_expr </dt>
@@ -191,13 +192,17 @@ Parentheses () can be used to group items into a single logical item. </li>
 </ul>
 <p class="enddd"></p>
 </dd>
-<dt>aggregate_func </dt>
-<dd><p class="startdd">VARCHAR. A comma-separated list of aggregates to be applied to the pattern matches [3]. Please note that window functions cannot currently be used in the parameter 'aggregate_func'. If you want to use a window function [4], output the pattern matches and write a SQL query with a window function over the output tuples (see 'persist_rows' parameter below).</p>
+<dt>aggregate_func (optional) </dt>
+<dd><p class="startdd">VARCHAR, default NULL. A comma-separated list of aggregates to be applied to the pattern matches [3]. Please note that window functions cannot currently be used in the parameter 'aggregate_func'. If you want to use a window function [4], output the pattern matches and write a SQL query with a window function over the output tuples (see 'persist_rows' parameter below).</p>
 <p>If you just want to output the pattern matched rows and not compute any aggregates, you can put NULL or '' in the 'aggregate_func' parameter. </p>
 <p class="enddd"></p>
 </dd>
-<dt>persist_rows </dt>
-<dd><p class="startdd">BOOLEAN. If TRUE the matched rows are persisted in a separate output table. This table is named as &lt;output_table&gt;_tuples (the string "_tuples" is added as suffix to the value of <em>output_table</em>). </p>
+<dt>persist_rows (optional) </dt>
+<dd><p class="startdd">BOOLEAN, default FALSE. If TRUE the matched rows are persisted in a separate output table. This table is named as &lt;output_table&gt;_tuples (the string "_tuples" is added as suffix to the value of <em>output_table</em>). </p>
+<p class="enddd"></p>
+</dd>
+<dt>overlapping_patterns (optional) </dt>
+<dd><p class="startdd">BOOLEAN, default FALSE. If TRUE find every occurrence of the pattern in the partition, regardless of whether it might have been part of a previously found match. </p>
 <p class="enddd"></p>
 </dd>
 </dl>
@@ -205,7 +210,7 @@ Parentheses () can be used to group items into a single logical item. </li>
 <p>The data set describes shopper behavior on a notional web site that sells beer and wine. A beacon fires an event to a log file when the shopper visits different pages on the site: landing page, beer selection page, wine selection page, and checkout. Other pages on the site like help pages show up in the logs as well. Let\u2019s assume that the log has been sessionized.</p>
 <p>Create the date table:</p>
 <pre class="example">
-DROP TABLE IF EXISTS eventlog, path_output, path_output_tuples;
+DROP TABLE IF EXISTS eventlog;
 CREATE TABLE eventlog (event_timestamp TIMESTAMP,
             user_id INT,
             session_id INT,
@@ -248,7 +253,8 @@ INSERT INTO eventlog VALUES
 ('04/15/2015 02:19:00', 103711, 109, 'WINE', 0);
 </pre><ol type="1">
 <li>Calculate the revenue by checkout: <pre class="example">
- SELECT madlib.path(
+DROP TABLE IF EXISTS path_output, path_output_tuples;
+SELECT madlib.path(
      'eventlog',                -- Name of input table
      'path_output',             -- Table name to store path results
      'session_id',              -- Partition input table by session
@@ -294,6 +300,7 @@ SELECT * FROM path_output_tuples ORDER BY session_id ASC, event_timestamp ASC;
 (6 rows)
 </pre> Notice that the 'symbol' and 'match_id' columns are added to the right of the matched rows.</li>
 <li>We are interested in sessions with an order placed within 4 pages of entering the shopping site via the landing page. We represent this by the regular expression: '(land)[^(land)(buy)]{0,2}(buy)'. In other words, visit to the landing page followed by from 0 to 2 non-entry, non-sale pages, followed by a purchase. The SQL is as follows: <pre class="example">
+DROP TABLE IF EXISTS path_output, path_output_tuples;
 SELECT madlib.path(
      'eventlog',                -- Name of input table
      'path_output',             -- Table name to store path results
@@ -348,6 +355,7 @@ SELECT DATE(event_timestamp), user_id, session_id, revenue,
 (3 rows)
 </pre> Here we are partitioning the window function by day because we want daily averages, although our sample data set only has a single day.</li>
 <li>Now we want to do a golden path analysis to find the most successful shopper paths through the site. Since our data set is small, we decide this means the most frequently viewed page just before a checkout is made: <pre class="example">
+DROP TABLE IF EXISTS path_output, path_output_tuples;
 SELECT madlib.path(
      'eventlog',                -- Name of input table
      'path_output',             -- Table name to store path results
@@ -362,25 +370,67 @@ SELECT madlib.path(
      'array_agg(page ORDER BY session_id ASC, event_timestamp ASC) as page_path',    -- Build array with shopper paths
      FALSE                       -- Don't persist matches
      );
-</pre></li>
-</ol>
-<p>Now count the common paths and print the most frequent:</p>
-<pre class="example">
+</pre> Now count the common paths and print the most frequent: <pre class="example">
 SELECT count(*), page_path from
     (SELECT * FROM path_output) q
 GROUP BY page_path
 ORDER BY count(*) DESC
 LIMIT 10;
-</pre><p>Result: </p><pre class="result">
+</pre> Result: <pre class="result">
  count |    page_path
 -------+-----------------
      5 | {WINE,CHECKOUT}
      1 | {BEER,CHECKOUT}
 (2 rows)
-</pre><p>There are only 2 different paths. The wine page is viewed more frequently than the beer page just before checkout.</p>
-<p><a class="anchor" id="note"></a></p><dl class="section note"><dt>Note</dt><dd>Please note some current limitations of the path algorithm. These limitations will be addressed in subsequent releases.<ul>
+</pre> There are only 2 different paths. The wine page is viewed more frequently than the beer page just before checkout.</li>
+<li>To demonstrate the use of 'overlapping_patterns', consider a pattern with at least one page followed by and ending with a checkout: <pre class="example">
+DROP TABLE IF EXISTS path_output, path_output_tuples;
+SELECT madlib.path(                                                                   
+     'eventlog',                    -- Name of the table                                           
+     'path_output',                 -- Table name to store the path results                         
+     'session_id',                  -- Partition by session                 
+     'event_timestamp ASC',         -- Order partitions in input table by time       
+     $$ nobuy:=page&lt;&gt;'CHECKOUT',
+        buy:=page='CHECKOUT'
+     $$,  -- Definition of symbols used in the pattern definition 
+     '(nobuy)+(buy)',         -- At least one page followed by and ending with a CHECKOUT.
+     'array_agg(page ORDER BY session_id ASC, event_timestamp ASC) as page_path',  
+     FALSE,                        -- Don't persist matches
+     TRUE                          -- Turn on overlapping patterns
+     );
+SELECT * FROM path_output ORDER BY session_id, match_id;
+</pre> Result with overlap turned on: <pre class="result">
+ session_id | match_id |             page_path             
+------------+----------+-----------------------------------
+        100 |        1 | {LANDING,WINE,CHECKOUT}
+        100 |        2 | {WINE,CHECKOUT}
+        102 |        1 | {LANDING,WINE,CHECKOUT}
+        102 |        2 | {WINE,CHECKOUT}
+        102 |        3 | {LANDING,HELP,WINE,CHECKOUT}
+        102 |        4 | {HELP,WINE,CHECKOUT}
+        102 |        5 | {WINE,CHECKOUT}
+        103 |        1 | {LANDING,WINE,HELP,WINE,CHECKOUT}
+        103 |        2 | {WINE,HELP,WINE,CHECKOUT}
+        103 |        3 | {HELP,WINE,CHECKOUT}
+        103 |        4 | {WINE,CHECKOUT}
+        104 |        1 | {BEER,CHECKOUT}
+        108 |        1 | {BEER,WINE,CHECKOUT}
+        108 |        2 | {WINE,CHECKOUT}
+(14 rows)
+</pre> With overlap turned off, the result would be: <pre class="result">
+ session_id | match_id |             page_path             
+------------+----------+-----------------------------------
+        100 |        1 | {LANDING,WINE,CHECKOUT}
+        102 |        1 | {LANDING,WINE,CHECKOUT}
+        102 |        2 | {LANDING,HELP,WINE,CHECKOUT}
+        103 |        1 | {LANDING,WINE,HELP,WINE,CHECKOUT}
+        104 |        1 | {BEER,CHECKOUT}
+        108 |        1 | {BEER,WINE,CHECKOUT}
+(6 rows)
+</pre></li>
+</ol>
+<p><a class="anchor" id="note"></a></p><dl class="section note"><dt>Note</dt><dd>Please note some current limitations of the path algorithm.<ul>
 <li>Window functions cannot currently be used in the parameter 'aggregate_func'. Instead, output the pattern matches and write a SQL query with a window function over the output tuples.</li>
-<li>Overlapping pattern matches are not supported. That is, a given row can only belong to one pattern match (non-overlapping).</li>
 <li>A given row can only match one symbol. If a row matches multiple symbols, the symbol that comes <em>first</em> in the symbol definition list will take precedence.</li>
 <li>Maximum number of symbols that can be defined is 35.</li>
 <li>The columns 'match_id' and 'symbol' are generated by the path algorithm. If coincidently you have columns in your input data named 'match_id' or 'symbol', the system generated column names will be changed to "__madlib_path_match_id__" and "__madlib_path_symbol__"</li>
@@ -415,7 +465,7 @@ LIMIT 10;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:11 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__pca.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__pca.html b/docs/latest/group__grp__pca.html
index 80e7174..c0db3ed 100644
--- a/docs/latest/group__grp__pca.html
+++ b/docs/latest/group__grp__pca.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -128,7 +128,7 @@ Modules</h2></td></tr>
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__pca__project.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__pca__project.html b/docs/latest/group__grp__pca__project.html
index 8d2fbf0..7e37313 100644
--- a/docs/latest/group__grp__pca__project.html
+++ b/docs/latest/group__grp__pca__project.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -145,7 +145,7 @@ madlib.pca_sparse_project( source_table,
 </pre></dd></dl>
 <dl class="section user"><dt>Arguments</dt><dd><dl class="arglist">
 <dt>source_table </dt>
-<dd><p class="startdd">TEXT. Source table name. Identical to <a class="el" href="pca_8sql__in.html#a31abf88e67a446a4f789764aa2c61e85">pca_train</a>, the input data matrix should have <img class="formulaInl" alt="$ N $" src="form_218.png"/> rows and <img class="formulaInl" alt="$ M $" src="form_174.png"/> columns, where <img class="formulaInl" alt="$ N $" src="form_218.png"/> is the number of data points, and <img class="formulaInl" alt="$ M $" src="form_174.png"/> is the number of features for each data point.</p>
+<dd><p class="startdd">TEXT. Source table name. Identical to <a class="el" href="pca_8sql__in.html#a31abf88e67a446a4f789764aa2c61e85">pca_train</a>, the input data matrix should have <img class="formulaInl" alt="$ N $" src="form_219.png"/> rows and <img class="formulaInl" alt="$ M $" src="form_174.png"/> columns, where <img class="formulaInl" alt="$ N $" src="form_219.png"/> is the number of data points, and <img class="formulaInl" alt="$ M $" src="form_174.png"/> is the number of features for each data point.</p>
 <p>The input table for <em> pca_project </em> is expected to be in the one of the two standard MADlib dense matrix formats, and the sparse input table for <em> pca_sparse_project </em> should be in the standard MADlib sparse matrix format. These formats are described in the documentation for <a class="el" href="pca_8sql__in.html#a31abf88e67a446a4f789764aa2c61e85">pca_train</a>.</p>
 <p class="enddd"></p>
 </dd>
@@ -260,19 +260,19 @@ SELECT * FROM result_summary_table;
 </ul>
 </dd></dl>
 <p><a class="anchor" id="background"></a></p><dl class="section user"><dt>Technical Background</dt><dd></dd></dl>
-<p>Given a table containing some principal components <img class="formulaInl" alt="$ \boldsymbol P $" src="form_229.png"/> and some input data <img class="formulaInl" alt="$ \boldsymbol X $" src="form_219.png"/>, the low-dimensional representation <img class="formulaInl" alt="$ {\boldsymbol X}' $" src="form_230.png"/> is computed as </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\begin{align*} {\boldsymbol {\hat{X}}} &amp; = {\boldsymbol X} - \vec{e} \hat{x}^T \\ {\boldsymbol X}' &amp; = {\boldsymbol {\hat {X}}} {\boldsymbol P}. \end{align*}" src="form_231.png"/>
+<p>Given a table containing some principal components <img class="formulaInl" alt="$ \boldsymbol P $" src="form_230.png"/> and some input data <img class="formulaInl" alt="$ \boldsymbol X $" src="form_220.png"/>, the low-dimensional representation <img class="formulaInl" alt="$ {\boldsymbol X}' $" src="form_231.png"/> is computed as </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\begin{align*} {\boldsymbol {\hat{X}}} &amp; = {\boldsymbol X} - \vec{e} \hat{x}^T \\ {\boldsymbol X}' &amp; = {\boldsymbol {\hat {X}}} {\boldsymbol P}. \end{align*}" src="form_232.png"/>
 </p>
-<p> where <img class="formulaInl" alt="$\hat{x} $" src="form_232.png"/> is the column means of <img class="formulaInl" alt="$ \boldsymbol X $" src="form_219.png"/> and <img class="formulaInl" alt="$ \vec{e} $" src="form_224.png"/> is the vector of all ones. This step is equivalent to centering the data around the origin.</p>
-<p>The residual table <img class="formulaInl" alt="$ \boldsymbol R $" src="form_233.png"/> is a measure of how well the low-dimensional representation approximates the true input data, and is computed as </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ {\boldsymbol R} = {\boldsymbol {\hat{X}}} - {\boldsymbol X}' {\boldsymbol P}^T. \]" src="form_234.png"/>
+<p> where <img class="formulaInl" alt="$\hat{x} $" src="form_233.png"/> is the column means of <img class="formulaInl" alt="$ \boldsymbol X $" src="form_220.png"/> and <img class="formulaInl" alt="$ \vec{e} $" src="form_225.png"/> is the vector of all ones. This step is equivalent to centering the data around the origin.</p>
+<p>The residual table <img class="formulaInl" alt="$ \boldsymbol R $" src="form_234.png"/> is a measure of how well the low-dimensional representation approximates the true input data, and is computed as </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ {\boldsymbol R} = {\boldsymbol {\hat{X}}} - {\boldsymbol X}' {\boldsymbol P}^T. \]" src="form_235.png"/>
 </p>
 <p> A residual matrix with entries mostly close to zero indicates a good representation.</p>
-<p>The residual norm <img class="formulaInl" alt="$ r $" src="form_235.png"/> is simply </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ r = \|{\boldsymbol R}\|_F \]" src="form_236.png"/>
+<p>The residual norm <img class="formulaInl" alt="$ r $" src="form_236.png"/> is simply </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ r = \|{\boldsymbol R}\|_F \]" src="form_237.png"/>
 </p>
-<p> where <img class="formulaInl" alt="$ \|\cdot\|_F $" src="form_237.png"/> is the Frobenius norm. The relative residual norm <img class="formulaInl" alt="$ r' $" src="form_238.png"/> is </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ r' = \frac{ \|{\boldsymbol R}\|_F }{\|{\boldsymbol X}\|_F } \]" src="form_239.png"/>
+<p> where <img class="formulaInl" alt="$ \|\cdot\|_F $" src="form_238.png"/> is the Frobenius norm. The relative residual norm <img class="formulaInl" alt="$ r' $" src="form_239.png"/> is </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ r' = \frac{ \|{\boldsymbol R}\|_F }{\|{\boldsymbol X}\|_F } \]" src="form_240.png"/>
 </p>
 <p><a class="anchor" id="related"></a></p><dl class="section user"><dt>Related Topics</dt><dd>File <a class="el" href="pca__project_8sql__in.html" title="Principal Component Analysis Projection. ">pca_project.sql_in</a> documenting the SQL functions</dd></dl>
 <p><a class="el" href="group__grp__pca__train.html">Principal Component Analysis</a> </p>
@@ -281,7 +281,7 @@ SELECT * FROM result_summary_table;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/bed9253d/docs/latest/group__grp__pca__train.html
----------------------------------------------------------------------
diff --git a/docs/latest/group__grp__pca__train.html b/docs/latest/group__grp__pca__train.html
index 94d6ce6..7853e57 100644
--- a/docs/latest/group__grp__pca__train.html
+++ b/docs/latest/group__grp__pca__train.html
@@ -47,7 +47,7 @@
   <td id="projectlogo"><a href="http://madlib.net"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
   <td style="padding-left: 0.5em;">
    <div id="projectname">
-   <span id="projectnumber">1.9</span>
+   <span id="projectnumber">1.9.1</span>
    </div>
    <div id="projectbrief">User Documentation for MADlib</div>
   </td>
@@ -152,7 +152,7 @@ pca_sparse_train( source_table,
 </pre></dd></dl>
 <p><b>Arguments</b> </p><dl class="arglist">
 <dt>source_table </dt>
-<dd><p class="startdd">TEXT. Name of the input table containing the data for PCA training. The input data matrix should have <img class="formulaInl" alt="$ N $" src="form_218.png"/> rows and <img class="formulaInl" alt="$ M $" src="form_174.png"/> columns, where <img class="formulaInl" alt="$ N $" src="form_218.png"/> is the number of data points, and <img class="formulaInl" alt="$ M $" src="form_174.png"/> is the number of features for each data point.</p>
+<dd><p class="startdd">TEXT. Name of the input table containing the data for PCA training. The input data matrix should have <img class="formulaInl" alt="$ N $" src="form_219.png"/> rows and <img class="formulaInl" alt="$ M $" src="form_174.png"/> columns, where <img class="formulaInl" alt="$ N $" src="form_219.png"/> is the number of data points, and <img class="formulaInl" alt="$ M $" src="form_174.png"/> is the number of features for each data point.</p>
 <p>A dense input table is expected to be in the one of the two standard MADlib dense matrix formats, and a sparse input table should be in the standard MADlib sparse matrix format.</p>
 <p>The two standard MADlib dense matrix formats are </p><pre>{TABLE|VIEW} <em>source_table</em> (
     <em>row_id</em> INTEGER,
@@ -307,14 +307,14 @@ SELECT * FROM result_table;
 <li>If both 'lanczos_iter' and proportion of variance (via the 'components_param' parameter) are defined, 'lanczos_iter' will take precedence in determining the number of principal components (i.e. the number of principal components will not be greater than 'lanczos_iter' even if the target proportion had not been reached).</li>
 </ul>
 <p><a class="anchor" id="background_pca"></a></p><dl class="section user"><dt>Technical Background</dt><dd></dd></dl>
-<p>The PCA implemented here uses an SVD decomposition implementation to recover the principal components (as opposed to the directly computing the eigenvectors of the covariance matrix). Let <img class="formulaInl" alt="$ \boldsymbol X $" src="form_219.png"/> be the data matrix, and let <img class="formulaInl" alt="$ \hat{x} $" src="form_220.png"/> be a vector of the column averages of <img class="formulaInl" alt="$ \boldsymbol{X}$" src="form_221.png"/>. PCA computes the matrix <img class="formulaInl" alt="$ \hat{\boldsymbol X} $" src="form_222.png"/> as </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ \hat{\boldsymbol X} = {\boldsymbol X} - \vec{e} \hat{x}^T \]" src="form_223.png"/>
+<p>The PCA implemented here uses an SVD decomposition implementation to recover the principal components (as opposed to the directly computing the eigenvectors of the covariance matrix). Let <img class="formulaInl" alt="$ \boldsymbol X $" src="form_220.png"/> be the data matrix, and let <img class="formulaInl" alt="$ \hat{x} $" src="form_221.png"/> be a vector of the column averages of <img class="formulaInl" alt="$ \boldsymbol{X}$" src="form_222.png"/>. PCA computes the matrix <img class="formulaInl" alt="$ \hat{\boldsymbol X} $" src="form_223.png"/> as </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ \hat{\boldsymbol X} = {\boldsymbol X} - \vec{e} \hat{x}^T \]" src="form_224.png"/>
 </p>
-<p> where <img class="formulaInl" alt="$ \vec{e} $" src="form_224.png"/> is the vector of all ones.</p>
+<p> where <img class="formulaInl" alt="$ \vec{e} $" src="form_225.png"/> is the vector of all ones.</p>
 <p>PCA then computes the SVD matrix factorization </p><p class="formulaDsp">
-<img class="formulaDsp" alt="\[ \hat{\boldsymbol X} = {\boldsymbol U}{\boldsymbol \Sigma}{\boldsymbol V}^T \]" src="form_225.png"/>
+<img class="formulaDsp" alt="\[ \hat{\boldsymbol X} = {\boldsymbol U}{\boldsymbol \Sigma}{\boldsymbol V}^T \]" src="form_226.png"/>
 </p>
-<p> where <img class="formulaInl" alt="$ {\boldsymbol \Sigma} $" src="form_226.png"/> is a diagonal matrix. The eigenvalues are recovered as the entries of <img class="formulaInl" alt="$ {\boldsymbol \Sigma}/(\sqrt{(N-1)} $" src="form_546.png"/>, and the principal components are the rows of <img class="formulaInl" alt="$ {\boldsymbol V} $" src="form_228.png"/>. The reasoning behind using N \u2212 1 instead of N to calculate the covariance is <a href="https://en.wikipedia.org/wiki/Bessel%27s_correction">Bessel's correction</a>.</p>
+<p> where <img class="formulaInl" alt="$ {\boldsymbol \Sigma} $" src="form_227.png"/> is a diagonal matrix. The eigenvalues are recovered as the entries of <img class="formulaInl" alt="$ {\boldsymbol \Sigma}/(\sqrt{(N-1)} $" src="form_228.png"/>, and the principal components are the rows of <img class="formulaInl" alt="$ {\boldsymbol V} $" src="form_229.png"/>. The reasoning behind using N \u2212 1 instead of N to calculate the covariance is <a href="https://en.wikipedia.org/wiki/Bessel%27s_correction">Bessel's correction</a>.</p>
 <p>It is important to note that the PCA implementation assumes that the user will use only the principal components that have non-zero eigenvalues. The SVD calculation is done with the Lanczos method, with does not guarantee correctness for singular vectors with zero-valued eigenvalues. Consequently, principal components with zero-valued eigenvalues are not guaranteed to be correct. Generally, this will not be problem unless the user wants to use the principal components for the entire eigenspectrum.</p>
 <p><a class="anchor" id="literature"></a></p><dl class="section user"><dt>Literature</dt><dd></dd></dl>
 <p>[1] Principal Component Analysis. <a href="http://en.wikipedia.org/wiki/Principal_component_analysis">http://en.wikipedia.org/wiki/Principal_component_analysis</a></p>
@@ -327,7 +327,7 @@ SELECT * FROM result_table;
 <!-- start footer part -->
 <div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
   <ul>
-    <li class="footer">Generated on Thu Apr 7 2016 14:24:10 for MADlib by
+    <li class="footer">Generated on Tue Sep 20 2016 11:27:01 for MADlib by
     <a href="http://www.doxygen.org/index.html">
     <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.10 </li>
   </ul>