You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@madlib.apache.org by ri...@apache.org on 2017/05/16 20:29:44 UTC

[29/51] [partial] incubator-madlib-site git commit: Add v1.11 docs

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/b5b51c69/docs/v1.11/group__grp__clustered__errors.html
----------------------------------------------------------------------
diff --git a/docs/v1.11/group__grp__clustered__errors.html b/docs/v1.11/group__grp__clustered__errors.html
new file mode 100644
index 0000000..e223db6
--- /dev/null
+++ b/docs/v1.11/group__grp__clustered__errors.html
@@ -0,0 +1,399 @@
+<!-- HTML header for doxygen 1.8.4-->
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+<head>
+<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
+<meta http-equiv="X-UA-Compatible" content="IE=9"/>
+<meta name="generator" content="Doxygen 1.8.13"/>
+<meta name="keywords" content="madlib,postgres,greenplum,machine learning,data mining,deep learning,ensemble methods,data science,market basket analysis,affinity analysis,pca,lda,regression,elastic net,huber white,proportional hazards,k-means,latent dirichlet allocation,bayes,support vector machines,svm"/>
+<title>MADlib: Clustered Variance</title>
+<link href="tabs.css" rel="stylesheet" type="text/css"/>
+<script type="text/javascript" src="jquery.js"></script>
+<script type="text/javascript" src="dynsections.js"></script>
+<link href="navtree.css" rel="stylesheet" type="text/css"/>
+<script type="text/javascript" src="resize.js"></script>
+<script type="text/javascript" src="navtreedata.js"></script>
+<script type="text/javascript" src="navtree.js"></script>
+<script type="text/javascript">
+  $(document).ready(initResizable);
+</script>
+<link href="search/search.css" rel="stylesheet" type="text/css"/>
+<script type="text/javascript" src="search/searchdata.js"></script>
+<script type="text/javascript" src="search/search.js"></script>
+<script type="text/javascript">
+  $(document).ready(function() { init_search(); });
+</script>
+<!-- hack in the navigation tree -->
+<script type="text/javascript" src="eigen_navtree_hacks.js"></script>
+<link href="doxygen.css" rel="stylesheet" type="text/css" />
+<link href="madlib_extra.css" rel="stylesheet" type="text/css"/>
+<!-- google analytics -->
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.incubator.apache.org');
+  ga('send', 'pageview');
+</script>
+</head>
+<body>
+<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
+<div id="titlearea">
+<table cellspacing="0" cellpadding="0">
+ <tbody>
+ <tr style="height: 56px;">
+  <td id="projectlogo"><a href="http://madlib.incubator.apache.org"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
+  <td style="padding-left: 0.5em;">
+   <div id="projectname">
+   <span id="projectnumber">1.11</span>
+   </div>
+   <div id="projectbrief">User Documentation for MADlib</div>
+  </td>
+   <td>        <div id="MSearchBox" class="MSearchBoxInactive">
+        <span class="left">
+          <img id="MSearchSelect" src="search/mag_sel.png"
+               onmouseover="return searchBox.OnSearchSelectShow()"
+               onmouseout="return searchBox.OnSearchSelectHide()"
+               alt=""/>
+          <input type="text" id="MSearchField" value="Search" accesskey="S"
+               onfocus="searchBox.OnSearchFieldFocus(true)" 
+               onblur="searchBox.OnSearchFieldFocus(false)" 
+               onkeyup="searchBox.OnSearchFieldChange(event)"/>
+          </span><span class="right">
+            <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
+          </span>
+        </div>
+</td>
+ </tr>
+ </tbody>
+</table>
+</div>
+<!-- end header part -->
+<!-- Generated by Doxygen 1.8.13 -->
+<script type="text/javascript">
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+</script>
+</div><!-- top -->
+<div id="side-nav" class="ui-resizable side-nav-resizable">
+  <div id="nav-tree">
+    <div id="nav-tree-contents">
+      <div id="nav-sync" class="sync"></div>
+    </div>
+  </div>
+  <div id="splitbar" style="-moz-user-select:none;" 
+       class="ui-resizable-handle">
+  </div>
+</div>
+<script type="text/javascript">
+$(document).ready(function(){initNavTree('group__grp__clustered__errors.html','');});
+</script>
+<div id="doc-content">
+<!-- window showing the filter options -->
+<div id="MSearchSelectWindow"
+     onmouseover="return searchBox.OnSearchSelectShow()"
+     onmouseout="return searchBox.OnSearchSelectHide()"
+     onkeydown="return searchBox.OnSearchSelectKey(event)">
+</div>
+
+<!-- iframe showing the search results (closed by default) -->
+<div id="MSearchResultsWindow">
+<iframe src="javascript:void(0)" frameborder="0" 
+        name="MSearchResults" id="MSearchResults">
+</iframe>
+</div>
+
+<div class="header">
+  <div class="headertitle">
+<div class="title">Clustered Variance<div class="ingroups"><a class="el" href="group__grp__super.html">Supervised Learning</a> &raquo; <a class="el" href="group__grp__regml.html">Regression Models</a></div></div>  </div>
+</div><!--header-->
+<div class="contents">
+<div class="toc"><b>Contents</b> <ul>
+<li>
+<a href="#train_linregr">Clustered Variance Linear Regression Training Function</a> </li>
+<li>
+<a href="#train_logregr">Clustered Variance Logistic Regression Training Function</a> </li>
+<li>
+<a href="#train_mlogregr">Clustered Variance Multinomial Logistic Regression Training Function</a> </li>
+<li>
+<a href="#train_cox">Clustered Variance for Cox Proportional Hazards model</a> </li>
+<li>
+<a href="#examples">Examples</a> </li>
+<li>
+<a href="#notes">Notes</a> </li>
+<li>
+<a href="#background">Technical Background</a> </li>
+<li>
+<a href="#related">Related Topics</a> </li>
+</ul>
+</div><p>The Clustered Variance module adjusts standard errors for clustering. For example, replicating a dataset 100 times should not increase the precision of parameter estimates, but performing this procedure with the IID assumption will actually do this. Another example is in economics of education research, it is reasonable to expect that the error terms for children in the same class are not independent. Clustering standard errors can correct for this.</p>
+<p>The MADlib Clustered Variance module includes functions to calculate linear, logistic, and multinomial logistic regression problems.</p>
+<p><a class="anchor" id="train_linregr"></a></p><dl class="section user"><dt>Clustered Variance Linear Regression Training Function</dt><dd></dd></dl>
+<p>The clustered variance linear regression training function has the following syntax. </p><pre class="syntax">
+clustered_variance_linregr ( source_table,
+                             out_table,
+                             dependent_varname,
+                             independent_varname,
+                             clustervar,
+                             grouping_cols
+                           )
+</pre><p> <b>Arguments</b> </p><dl class="arglist">
+<dt>source_table </dt>
+<dd><p class="startdd">TEXT. The name of the table containing the input data.</p>
+<p class="enddd"></p>
+</dd>
+<dt>out_table </dt>
+<dd><p class="startdd">VARCHAR. Name of the generated table containing the output model. The output table contains the following columns. </p><table class="output">
+<tr>
+<th>coef </th><td>DOUBLE PRECISION[]. Vector of the coefficients of the regression.  </td></tr>
+<tr>
+<th>std_err </th><td>DOUBLE PRECISION[]. Vector of the standard error of the coefficients.  </td></tr>
+<tr>
+<th>t_stats </th><td>DOUBLE PRECISION[]. Vector of the t-stats of the coefficients.  </td></tr>
+<tr>
+<th>p_values </th><td>DOUBLE PRECISION[]. Vector of the p-values of the coefficients.  </td></tr>
+</table>
+<p>A summary table named &lt;out_table&gt;_summary is also created, which is the same as the summary table created by linregr_train function. Please refer to the documentation for linear regression for details.</p>
+<p></p>
+<p class="enddd"></p>
+</dd>
+<dt>dependent_varname </dt>
+<dd>TEXT. An expression to evaluate for the dependent variable. </dd>
+<dt>independent_varname </dt>
+<dd>TEXT. An Expression to evalue for the independent variables. </dd>
+<dt>clustervar </dt>
+<dd>TEXT. A comma-separated list of the columns to use as cluster variables. </dd>
+<dt>grouping_cols (optional) </dt>
+<dd>TEXT, default: NULL. <em>Not currently implemented. Any non-NULL value is ignored.</em> An expression list used to group the input dataset into discrete groups, running one regression per group. Similar to the SQL GROUP BY clause. When this value is null, no grouping is used and a single result model is generated. </dd>
+</dl>
+<p><a class="anchor" id="train_logregr"></a></p><dl class="section user"><dt>Clustered Variance Logistic Regression Training Function</dt><dd></dd></dl>
+<p>The clustered variance logistic regression training function has the following syntax. </p><pre class="syntax">
+clustered_variance_logregr( source_table,
+                            out_table,
+                            dependent_varname,
+                            independent_varname,
+                            clustervar,
+                            grouping_cols,
+                            max_iter,
+                            optimizer,
+                            tolerance,
+                            verbose_mode
+                          )
+</pre><p> <b>Arguments</b> </p><dl class="arglist">
+<dt>source_table </dt>
+<dd>TEXT. The name of the table containing the input data. </dd>
+<dt>out_table </dt>
+<dd><p class="startdd">VARCHAR. Name of the generated table containing the output model. The output table has the following columns: </p><table class="output">
+<tr>
+<th>coef </th><td>Vector of the coefficients of the regression.  </td></tr>
+<tr>
+<th>std_err </th><td>Vector of the standard error of the coefficients.  </td></tr>
+<tr>
+<th>z_stats </th><td>Vector of the z-stats of the coefficients.  </td></tr>
+<tr>
+<th>p_values </th><td>Vector of the p-values of the coefficients.  </td></tr>
+</table>
+<p>A summary table named &lt;out_table&gt;_summary is also created, which is the same as the summary table created by logregr_train function. Please refer to the documentation for logistic regression for details.</p>
+<p class="enddd"></p>
+</dd>
+<dt>dependent_varname </dt>
+<dd>TEXT. An expression to evaluate for the dependent variable. </dd>
+<dt>independent_varname </dt>
+<dd>TEXT. An expression to evaluate for the independent variable. </dd>
+<dt>clustervar </dt>
+<dd>TEXT. A comma-separated list of columns to use as cluster variables. </dd>
+<dt>grouping_cols (optional) </dt>
+<dd>TEXT, default: NULL. <em>Not yet implemented. Any non-NULL values are ignored.</em> An expression list used to group the input dataset into discrete groups, running one regression per group. Similar to the SQL GROUP BY clause. When this value is NULL, no grouping is used and a single result model is generated. </dd>
+<dt>max_iter (optional) </dt>
+<dd>INTEGER, default: 20. The maximum number of iterations that are allowed. </dd>
+<dt>optimizer (optional) </dt>
+<dd>TEXT, default: 'irls'. The name of the optimizer to use: <ul>
+<li>
+'newton' or 'irls': Iteratively reweighted least squares </li>
+<li>
+'cg': conjugate gradient </li>
+<li>
+'igd': incremental gradient descent. </li>
+</ul>
+</dd>
+<dt>tolerance (optional) </dt>
+<dd>FLOAT8, default: 0.0001 The difference between log-likelihood values in successive iterations that should indicate convergence. A zero disables the convergence criterion, so that execution stops after <em>n</em> Iterations have completed. </dd>
+<dt>verbose_mode (optional) </dt>
+<dd>BOOLEAN, default FALSE. Provides verbose_mode output of the results of training. </dd>
+</dl>
+<p><a class="anchor" id="train_mlogregr"></a></p><dl class="section user"><dt>Clustered Variance Multinomial Logistic Regression Training Function</dt><dd></dd></dl>
+<pre class="syntax">
+clustered_variance_mlogregr( source_table,
+                             out_table,
+                             dependent_varname,
+                             independent_varname,
+                             cluster_varname,
+                             ref_category,
+                             grouping_cols,
+                             optimizer_params,
+                             verbose_mode
+                           )
+</pre><p> <b>Arguments</b> </p><dl class="arglist">
+<dt>source_table </dt>
+<dd>TEXT. The name of the table containing the input data. </dd>
+<dt>out_table </dt>
+<dd><p class="startdd">TEXT. The name of the table where the regression model will be stored. The output table has the following columns: </p><table class="output">
+<tr>
+<th>category </th><td>The category.  </td></tr>
+<tr>
+<th>ref_category </th><td>The refererence category used for modeling.  </td></tr>
+<tr>
+<th>coef </th><td>Vector of the coefficients of the regression.  </td></tr>
+<tr>
+<th>std_err </th><td>Vector of the standard error of the coefficients.  </td></tr>
+<tr>
+<th>z_stats </th><td>Vector of the z-stats of the coefficients.  </td></tr>
+<tr>
+<th>p_values </th><td>Vector of the p-values of the coefficients.  </td></tr>
+</table>
+<p class="enddd">A summary table named &lt;out_table&gt;_summary is also created, which is the same as the summary table created by mlogregr_train function. Please refer to the documentation for multinomial logistic regression for details.  </p>
+</dd>
+<dt>dependent_varname </dt>
+<dd>TEXT. An expression to evaluate for the dependent variable. </dd>
+<dt>independent_varname </dt>
+<dd>TEXT. An expression to evaluate for the independent variable. </dd>
+<dt>cluster_varname </dt>
+<dd>TEXT. A comma-separated list of columns to use as cluster variables. </dd>
+<dt>ref_category (optional) </dt>
+<dd>INTEGER. Reference category in the range [0, num_category). </dd>
+<dt>groupingvarng_cols (optional) </dt>
+<dd>TEXT, default: NULL. <em>Not yet implemented. Any non-NULL values are ignored.</em> A comma-separated list of columns to use as grouping variables. </dd>
+<dt>optimizer_params (optional) </dt>
+<dd>TEXT, default: NULL, which uses the default values of optimizer parameters: max_iter=20, optimizer='newton', tolerance=1e-4. It should be a string that contains pairs of 'key=value' separated by commas. </dd>
+<dt>verbose_mode (optional) </dt>
+<dd>BOOLEAN, default FALSE. If TRUE, detailed information is printed when computing logistic regression. </dd>
+</dl>
+<p><a class="anchor" id="train_cox"></a></p><dl class="section user"><dt>Clustered Variance for Cox Proportional Hazards model</dt><dd></dd></dl>
+<p>The clustered robust variance estimator function for the Cox Proportional Hazards model has the following syntax. </p><pre class="syntax">
+clustered_variance_coxph(model_table, output_table, clustervar)
+</pre><p><b>Arguments</b> </p><dl class="arglist">
+<dt>model_table </dt>
+<dd>TEXT. The name of the model table, which is exactaly the same as the 'output_table' parameter of <a class="el" href="cox__prop__hazards_8sql__in.html#a737450bbfe0f10204b0074a9d45b0cef" title="Compute cox-regression coefficients and diagnostic statistics. ">coxph_train()</a> function. </dd>
+<dt>output_table </dt>
+<dd>TEXT. The name of the table where the output is saved. It has the following columns: <table class="output">
+<tr>
+<th>coef </th><td>FLOAT8[]. Vector of the coefficients.  </td></tr>
+<tr>
+<th>loglikelihood </th><td>FLOAT8. Log-likelihood value of the MLE estimate.  </td></tr>
+<tr>
+<th>std_err </th><td>FLOAT8[]. Vector of the standard error of the coefficients.  </td></tr>
+<tr>
+<th>clustervar </th><td>TEXT. A comma-separated list of columns to use as cluster variables.  </td></tr>
+<tr>
+<th>clustered_se </th><td>FLOAT8[]. Vector of the robust standard errors of the coefficients.  </td></tr>
+<tr>
+<th>clustered_z </th><td>FLOAT8[]. Vector of the robust z-stats of the coefficients.  </td></tr>
+<tr>
+<th>clustered_p </th><td>FLOAT8[]. Vector of the robust p-values of the coefficients.  </td></tr>
+<tr>
+<th>hessian </th><td>FLOAT8[]. The Hessian matrix.  </td></tr>
+</table>
+</dd>
+<dt>clustervar </dt>
+<dd>TEXT. A comma-separated list of columns to use as cluster variables. </dd>
+</dl>
+<p><a class="anchor" id="examples"></a></p><dl class="section user"><dt>Examples</dt><dd></dd></dl>
+<ol type="1">
+<li>View online help for the clustered variance linear regression function. <pre class="example">
+SELECT madlib.clustered_variance_linregr();
+</pre></li>
+<li>Run the linear regression function and view the results. <pre class="example">
+DROP TABLE IF EXISTS out_table;
+SELECT madlib.clustered_variance_linregr( 'abalone',
+                                          'out_table',
+                                          'rings',
+                                          'ARRAY[1, diameter, length, width]',
+                                          'sex',
+                                          NULL
+                                        );
+SELECT * FROM out_table;
+</pre></li>
+<li>View online help for the clustered variance logistic regression function. <pre class="example">
+SELECT madlib.clustered_variance_logregr();
+</pre></li>
+<li>Run the logistic regression function and view the results. <pre class="example">
+DROP TABLE IF EXISTS out_table;
+SELECT madlib.clustered_variance_logregr( 'abalone',
+                                          'out_table',
+                                          'rings &lt; 10',
+                                          'ARRAY[1, diameter, length, width]',
+                                          'sex'
+                                        );
+SELECT * FROM out_table;
+</pre></li>
+<li>View online help for the clustered variance multinomial logistic regression function. <pre class="example">
+SELECT madlib.clustered_variance_mlogregr();
+</pre></li>
+<li>Run the multinomial logistic regression and view the results. <pre class="example">
+DROP TABLE IF EXISTS out_table;
+SELECT madlib.clustered_variance_mlogregr( 'abalone',
+                                           'out_table',
+                                           'CASE WHEN rings &lt; 10 THEN 1 ELSE 0 END',
+                                           'ARRAY[1, diameter, length, width]',
+                                           'sex',
+                                           0
+                                         );
+SELECT * FROM out_table;
+</pre></li>
+<li>Run the Cox Proportional Hazards regression and compute the clustered robust estimator. <pre class="example">
+DROP TABLE IF EXISTS lung_cl_out;
+DROP TABLE IF EXISTS lung_out;
+DROP TABLE IF EXISTS lung_out_summary;
+SELECT madlib.coxph_train('lung',
+                          'lung_out',
+                          'time',
+                          'array[age, "ph.ecog"]',
+                          'TRUE',
+                          NULL,
+                          NULL);
+SELECT madlib.clustered_variance_coxph('lung_out',
+                                       'lung_cl_out',
+                                       '"ph.karno"');
+SELECT * FROM lung_cl_out;
+</pre></li>
+</ol>
+<p><a class="anchor" id="notes"></a></p><dl class="section user"><dt>Notes</dt><dd></dd></dl>
+<ul>
+<li>Note that we need to manually include an intercept term in the independent variable expression. The NULL value of <em>groupingvar</em> means that there is no grouping in the calculation.</li>
+</ul>
+<p><a class="anchor" id="background"></a></p><dl class="section user"><dt>Technical Background</dt><dd></dd></dl>
+<p>Assume that the data can be separated into <img class="formulaInl" alt="$m$" src="form_314.png"/> clusters. Usually this can be done by grouping the data table according to one or multiple columns.</p>
+<p>The estimator has a similar form to the usual sandwich estimator </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ S(\vec{c}) = B(\vec{c}) M(\vec{c}) B(\vec{c}) \]" src="form_315.png"/>
+</p>
+<p>The bread part is the same as Huber-White sandwich estimator </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\begin{eqnarray} B(\vec{c}) &amp; = &amp; \left(-\sum_{i=1}^{n} H(y_i, \vec{x}_i, \vec{c})\right)^{-1}\\ &amp; = &amp; \left(-\sum_{i=1}^{n}\frac{\partial^2 l(y_i, \vec{x}_i, \vec{c})}{\partial c_\alpha \partial c_\beta}\right)^{-1} \end{eqnarray}" src="form_316.png"/>
+</p>
+<p> where <img class="formulaInl" alt="$H$" src="form_317.png"/> is the hessian matrix, which is the second derivative of the target function </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ L(\vec{c}) = \sum_{i=1}^n l(y_i, \vec{x}_i, \vec{c})\ . \]" src="form_318.png"/>
+</p>
+<p>The meat part is different </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ M(\vec{c}) = \bf{A}^T\bf{A} \]" src="form_319.png"/>
+</p>
+<p> where the <img class="formulaInl" alt="$m$" src="form_314.png"/>-th row of <img class="formulaInl" alt="$\bf{A}$" src="form_320.png"/> is </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ A_m = \sum_{i\in G_m}\frac{\partial l(y_i,\vec{x}_i,\vec{c})}{\partial \vec{c}} \]" src="form_321.png"/>
+</p>
+<p> where <img class="formulaInl" alt="$G_m$" src="form_322.png"/> is the set of rows that belong to the same cluster.</p>
+<p>We can compute the quantities of <img class="formulaInl" alt="$B$" src="form_207.png"/> and <img class="formulaInl" alt="$A$" src="form_42.png"/> for each cluster during one scan through the data table in an aggregate function. Then sum over all clusters to the full <img class="formulaInl" alt="$B$" src="form_207.png"/> and <img class="formulaInl" alt="$A$" src="form_42.png"/> in the outside of the aggregate function. At last, the matrix mulplitications are done in a separate function on the master node.</p>
+<p>When multinomial logistic regression is computed before the multinomial clustered variance calculation, it uses a default reference category of zero and the regression coefficients are included in the output table. The regression coefficients in the output are in the same order as multinomial logistic regression function, which is described below. For a problem with <img class="formulaInl" alt="$ K $" src="form_118.png"/> dependent variables <img class="formulaInl" alt="$ (1, ..., K) $" src="form_119.png"/> and <img class="formulaInl" alt="$ J $" src="form_120.png"/> categories <img class="formulaInl" alt="$ (0, ..., J-1) $" src="form_121.png"/>, let <img class="formulaInl" alt="$ {m_{k,j}} $" src="form_122.png"/> denote the coefficient for dependent variable <img class="formulaInl" alt="$ k $" src="form_98.png"/> and category <img class="formulaInl" alt="$ j $" src="form_123.png"/>. The output is <img class="formulaInl" alt="$ {m_{k_1, j_0}, m_{k_1, j_1} \ldots m_{k_1, j_{J-1}},
  m_{k_2, j_0}, m_{k_2, j_1} \ldots m_{k_K, j_{J-1}}} $" src="form_323.png"/>. The order is NOT CONSISTENT with the multinomial regression marginal effect calculation with function <em>marginal_mlogregr</em>. This is deliberate because the interfaces of all multinomial regressions (robust, clustered, ...) will be moved to match that used in marginal.</p>
+<p><a class="anchor" id="literature"></a></p><dl class="section user"><dt>Literature</dt><dd></dd></dl>
+<p>[1] Standard, Robust, and Clustered Standard Errors Computed in R, <a href="http://diffuseprior.wordpress.com/2012/06/15/standard-robust-and-clustered-standard-errors-computed-in-r/">http://diffuseprior.wordpress.com/2012/06/15/standard-robust-and-clustered-standard-errors-computed-in-r/</a></p>
+<p><a class="anchor" id="related"></a></p><dl class="section user"><dt>Related Topics</dt><dd>File <a class="el" href="clustered__variance_8sql__in.html">clustered_variance.sql_in</a> documenting the clustered variance SQL functions.</dd></dl>
+<p>File <a class="el" href="clustered__variance__coxph_8sql__in.html" title="SQL functions for clustered robust cox proportional hazards regression. ">clustered_variance_coxph.sql_in</a> documenting the clustered variance for Cox proportional hazards SQL functions.</p>
+</div><!-- contents -->
+</div><!-- doc-content -->
+<!-- start footer part -->
+<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
+  <ul>
+    <li class="footer">Generated on Tue May 16 2017 13:24:38 for MADlib by
+    <a href="http://www.doxygen.org/index.html">
+    <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.13 </li>
+  </ul>
+</div>
+</body>
+</html>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/b5b51c69/docs/v1.11/group__grp__clustering.html
----------------------------------------------------------------------
diff --git a/docs/v1.11/group__grp__clustering.html b/docs/v1.11/group__grp__clustering.html
new file mode 100644
index 0000000..971f803
--- /dev/null
+++ b/docs/v1.11/group__grp__clustering.html
@@ -0,0 +1,133 @@
+<!-- HTML header for doxygen 1.8.4-->
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+<head>
+<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
+<meta http-equiv="X-UA-Compatible" content="IE=9"/>
+<meta name="generator" content="Doxygen 1.8.13"/>
+<meta name="keywords" content="madlib,postgres,greenplum,machine learning,data mining,deep learning,ensemble methods,data science,market basket analysis,affinity analysis,pca,lda,regression,elastic net,huber white,proportional hazards,k-means,latent dirichlet allocation,bayes,support vector machines,svm"/>
+<title>MADlib: Clustering</title>
+<link href="tabs.css" rel="stylesheet" type="text/css"/>
+<script type="text/javascript" src="jquery.js"></script>
+<script type="text/javascript" src="dynsections.js"></script>
+<link href="navtree.css" rel="stylesheet" type="text/css"/>
+<script type="text/javascript" src="resize.js"></script>
+<script type="text/javascript" src="navtreedata.js"></script>
+<script type="text/javascript" src="navtree.js"></script>
+<script type="text/javascript">
+  $(document).ready(initResizable);
+</script>
+<link href="search/search.css" rel="stylesheet" type="text/css"/>
+<script type="text/javascript" src="search/searchdata.js"></script>
+<script type="text/javascript" src="search/search.js"></script>
+<script type="text/javascript">
+  $(document).ready(function() { init_search(); });
+</script>
+<!-- hack in the navigation tree -->
+<script type="text/javascript" src="eigen_navtree_hacks.js"></script>
+<link href="doxygen.css" rel="stylesheet" type="text/css" />
+<link href="madlib_extra.css" rel="stylesheet" type="text/css"/>
+<!-- google analytics -->
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.incubator.apache.org');
+  ga('send', 'pageview');
+</script>
+</head>
+<body>
+<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
+<div id="titlearea">
+<table cellspacing="0" cellpadding="0">
+ <tbody>
+ <tr style="height: 56px;">
+  <td id="projectlogo"><a href="http://madlib.incubator.apache.org"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
+  <td style="padding-left: 0.5em;">
+   <div id="projectname">
+   <span id="projectnumber">1.11</span>
+   </div>
+   <div id="projectbrief">User Documentation for MADlib</div>
+  </td>
+   <td>        <div id="MSearchBox" class="MSearchBoxInactive">
+        <span class="left">
+          <img id="MSearchSelect" src="search/mag_sel.png"
+               onmouseover="return searchBox.OnSearchSelectShow()"
+               onmouseout="return searchBox.OnSearchSelectHide()"
+               alt=""/>
+          <input type="text" id="MSearchField" value="Search" accesskey="S"
+               onfocus="searchBox.OnSearchFieldFocus(true)" 
+               onblur="searchBox.OnSearchFieldFocus(false)" 
+               onkeyup="searchBox.OnSearchFieldChange(event)"/>
+          </span><span class="right">
+            <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
+          </span>
+        </div>
+</td>
+ </tr>
+ </tbody>
+</table>
+</div>
+<!-- end header part -->
+<!-- Generated by Doxygen 1.8.13 -->
+<script type="text/javascript">
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+</script>
+</div><!-- top -->
+<div id="side-nav" class="ui-resizable side-nav-resizable">
+  <div id="nav-tree">
+    <div id="nav-tree-contents">
+      <div id="nav-sync" class="sync"></div>
+    </div>
+  </div>
+  <div id="splitbar" style="-moz-user-select:none;" 
+       class="ui-resizable-handle">
+  </div>
+</div>
+<script type="text/javascript">
+$(document).ready(function(){initNavTree('group__grp__clustering.html','');});
+</script>
+<div id="doc-content">
+<!-- window showing the filter options -->
+<div id="MSearchSelectWindow"
+     onmouseover="return searchBox.OnSearchSelectShow()"
+     onmouseout="return searchBox.OnSearchSelectHide()"
+     onkeydown="return searchBox.OnSearchSelectKey(event)">
+</div>
+
+<!-- iframe showing the search results (closed by default) -->
+<div id="MSearchResultsWindow">
+<iframe src="javascript:void(0)" frameborder="0" 
+        name="MSearchResults" id="MSearchResults">
+</iframe>
+</div>
+
+<div class="header">
+  <div class="summary">
+<a href="#groups">Modules</a>  </div>
+  <div class="headertitle">
+<div class="title">Clustering<div class="ingroups"><a class="el" href="group__grp__unsupervised.html">Unsupervised Learning</a></div></div>  </div>
+</div><!--header-->
+<div class="contents">
+<a name="details" id="details"></a><h2 class="groupheader">Detailed Description</h2>
+<p>A collection of methods for clustering data </p>
+<table class="memberdecls">
+<tr class="heading"><td colspan="2"><h2 class="groupheader"><a name="groups"></a>
+Modules</h2></td></tr>
+<tr class="memitem:group__grp__kmeans"><td class="memItemLeft" align="right" valign="top">&#160;</td><td class="memItemRight" valign="bottom"><a class="el" href="group__grp__kmeans.html">k-Means Clustering</a></td></tr>
+<tr class="memdesc:group__grp__kmeans"><td class="mdescLeft">&#160;</td><td class="mdescRight">Partitions a set of observations into clusters by finding centroids that minimize the sum of observations' distances from their closest centroid. <br /></td></tr>
+<tr class="separator:"><td class="memSeparator" colspan="2">&#160;</td></tr>
+</table>
+</div><!-- contents -->
+</div><!-- doc-content -->
+<!-- start footer part -->
+<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
+  <ul>
+    <li class="footer">Generated on Tue May 16 2017 13:24:38 for MADlib by
+    <a href="http://www.doxygen.org/index.html">
+    <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.13 </li>
+  </ul>
+</div>
+</body>
+</html>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/b5b51c69/docs/v1.11/group__grp__clustering.js
----------------------------------------------------------------------
diff --git a/docs/v1.11/group__grp__clustering.js b/docs/v1.11/group__grp__clustering.js
new file mode 100644
index 0000000..61858da
--- /dev/null
+++ b/docs/v1.11/group__grp__clustering.js
@@ -0,0 +1,4 @@
+var group__grp__clustering =
+[
+    [ "k-Means Clustering", "group__grp__kmeans.html", null ]
+];
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/b5b51c69/docs/v1.11/group__grp__correlation.html
----------------------------------------------------------------------
diff --git a/docs/v1.11/group__grp__correlation.html b/docs/v1.11/group__grp__correlation.html
new file mode 100644
index 0000000..e5ab148
--- /dev/null
+++ b/docs/v1.11/group__grp__correlation.html
@@ -0,0 +1,261 @@
+<!-- HTML header for doxygen 1.8.4-->
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+<head>
+<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
+<meta http-equiv="X-UA-Compatible" content="IE=9"/>
+<meta name="generator" content="Doxygen 1.8.13"/>
+<meta name="keywords" content="madlib,postgres,greenplum,machine learning,data mining,deep learning,ensemble methods,data science,market basket analysis,affinity analysis,pca,lda,regression,elastic net,huber white,proportional hazards,k-means,latent dirichlet allocation,bayes,support vector machines,svm"/>
+<title>MADlib: Pearson&#39;s Correlation</title>
+<link href="tabs.css" rel="stylesheet" type="text/css"/>
+<script type="text/javascript" src="jquery.js"></script>
+<script type="text/javascript" src="dynsections.js"></script>
+<link href="navtree.css" rel="stylesheet" type="text/css"/>
+<script type="text/javascript" src="resize.js"></script>
+<script type="text/javascript" src="navtreedata.js"></script>
+<script type="text/javascript" src="navtree.js"></script>
+<script type="text/javascript">
+  $(document).ready(initResizable);
+</script>
+<link href="search/search.css" rel="stylesheet" type="text/css"/>
+<script type="text/javascript" src="search/searchdata.js"></script>
+<script type="text/javascript" src="search/search.js"></script>
+<script type="text/javascript">
+  $(document).ready(function() { init_search(); });
+</script>
+<!-- hack in the navigation tree -->
+<script type="text/javascript" src="eigen_navtree_hacks.js"></script>
+<link href="doxygen.css" rel="stylesheet" type="text/css" />
+<link href="madlib_extra.css" rel="stylesheet" type="text/css"/>
+<!-- google analytics -->
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.incubator.apache.org');
+  ga('send', 'pageview');
+</script>
+</head>
+<body>
+<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
+<div id="titlearea">
+<table cellspacing="0" cellpadding="0">
+ <tbody>
+ <tr style="height: 56px;">
+  <td id="projectlogo"><a href="http://madlib.incubator.apache.org"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
+  <td style="padding-left: 0.5em;">
+   <div id="projectname">
+   <span id="projectnumber">1.11</span>
+   </div>
+   <div id="projectbrief">User Documentation for MADlib</div>
+  </td>
+   <td>        <div id="MSearchBox" class="MSearchBoxInactive">
+        <span class="left">
+          <img id="MSearchSelect" src="search/mag_sel.png"
+               onmouseover="return searchBox.OnSearchSelectShow()"
+               onmouseout="return searchBox.OnSearchSelectHide()"
+               alt=""/>
+          <input type="text" id="MSearchField" value="Search" accesskey="S"
+               onfocus="searchBox.OnSearchFieldFocus(true)" 
+               onblur="searchBox.OnSearchFieldFocus(false)" 
+               onkeyup="searchBox.OnSearchFieldChange(event)"/>
+          </span><span class="right">
+            <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
+          </span>
+        </div>
+</td>
+ </tr>
+ </tbody>
+</table>
+</div>
+<!-- end header part -->
+<!-- Generated by Doxygen 1.8.13 -->
+<script type="text/javascript">
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+</script>
+</div><!-- top -->
+<div id="side-nav" class="ui-resizable side-nav-resizable">
+  <div id="nav-tree">
+    <div id="nav-tree-contents">
+      <div id="nav-sync" class="sync"></div>
+    </div>
+  </div>
+  <div id="splitbar" style="-moz-user-select:none;" 
+       class="ui-resizable-handle">
+  </div>
+</div>
+<script type="text/javascript">
+$(document).ready(function(){initNavTree('group__grp__correlation.html','');});
+</script>
+<div id="doc-content">
+<!-- window showing the filter options -->
+<div id="MSearchSelectWindow"
+     onmouseover="return searchBox.OnSearchSelectShow()"
+     onmouseout="return searchBox.OnSearchSelectHide()"
+     onkeydown="return searchBox.OnSearchSelectKey(event)">
+</div>
+
+<!-- iframe showing the search results (closed by default) -->
+<div id="MSearchResultsWindow">
+<iframe src="javascript:void(0)" frameborder="0" 
+        name="MSearchResults" id="MSearchResults">
+</iframe>
+</div>
+
+<div class="header">
+  <div class="headertitle">
+<div class="title">Pearson's Correlation<div class="ingroups"><a class="el" href="group__grp__stats.html">Statistics</a> &raquo; <a class="el" href="group__grp__desc__stats.html">Descriptive Statistics</a></div></div>  </div>
+</div><!--header-->
+<div class="contents">
+<div class="toc"><b>Contents</b> <ul>
+<li>
+<a href="#usage">Correlation Function</a> </li>
+<li>
+<a href="#examples">Examples</a> </li>
+<li>
+<a href="#seealso">See Also</a> </li>
+</ul>
+</div><p>A correlation function is the degree and direction of association of two variables&mdash;how well one random variable can be predicted from the other. The coefficient of correlation varies from -1 to 1. A coefficient of 1 implies perfect correlation, 0 means no correlation, and -1 means perfect anti-correlation.</p>
+<p>This function provides a cross-correlation matrix for all pairs of numeric columns in a <em>source_table</em>. A correlation matrix describes correlation among <img class="formulaInl" alt="$ M $" src="form_175.png"/> variables. It is a square symmetrical <img class="formulaInl" alt="$ M $" src="form_175.png"/>x <img class="formulaInl" alt="$M $" src="form_380.png"/> matrix with the <img class="formulaInl" alt="$ (ij) $" src="form_381.png"/>th element equal to the correlation coefficient between the <img class="formulaInl" alt="$i$" src="form_129.png"/>th and the <img class="formulaInl" alt="$j$" src="form_130.png"/>th variable. The diagonal elements (correlations of variables with themselves) are always equal to 1.0.</p>
+<p><a class="anchor" id="usage"></a></p><dl class="section user"><dt>Correlation Function</dt><dd></dd></dl>
+<p>The correlation function has the following syntax: </p><pre class="syntax">
+correlation( source_table,
+             output_table,
+             target_cols,
+             verbose
+           )
+</pre><p>The covariance function, with a similar syntax, can be used to compute the covariance between features. </p><pre class="syntax">
+covariance( source_table,
+             output_table,
+             target_cols,
+             verbose
+           )
+</pre><dl class="arglist">
+<dt>source_table </dt>
+<dd><p class="startdd">TEXT. The name of the data containing the input data.</p>
+<p class="enddd"></p>
+</dd>
+<dt>output_table </dt>
+<dd><p class="startdd">TEXT. The name of the table where the cross-correlation matrix will be saved. The output is a table with N+2 columns and N rows, where N is the number of target columns. It contains the following columns. </p><table class="output">
+<tr>
+<th>column_position </th><td>The first column is a sequential counter indicating the position of the variable in the '<em>output_table</em>'.  </td></tr>
+<tr>
+<th>variable </th><td>The second column contains the row-header for the variables.  </td></tr>
+<tr>
+<th>&lt;...&gt; </th><td>The remainder of the table is the NxN correlation matrix for the pairs of numeric 'source_table' columns.  </td></tr>
+</table>
+<p>The output table is arranged as a lower-triangular matrix with the upper triangle set to NULL and the diagonal elements set to 1.0. To obtain the result from the '<em>output_table</em>' in this matrix format ensure to order the elements using the '<em>column_position</em>', as shown in the example below. </p><pre class="example">
+SELECT * FROM output_table ORDER BY column_position;
+</pre><p>In addition to output table, a summary table named &lt;output_table&gt;_summary is also created at the same time, which has the following columns: </p><table class="output">
+<tr>
+<th>method</th><td>'correlation' </td></tr>
+<tr>
+<th>source_table</th><td>VARCHAR. The data source table name. </td></tr>
+<tr>
+<th>output_table</th><td>VARCHAR. The output table name. </td></tr>
+<tr>
+<th>column_names</th><td>VARCHAR. Column names used for correlation computation, comma-separated string. </td></tr>
+<tr>
+<th>mean_vector</th><td>FLOAT8[]. Vector where each is the mean of a column. </td></tr>
+<tr>
+<th>total_rows_processed </th><td>BIGINT. Total numbers of rows processed.  </td></tr>
+<tr>
+<th>total_rows_skipped </th><td>BIGINT. Total numbers of rows skipped due to missing values.  </td></tr>
+</table>
+<p class="enddd"></p>
+</dd>
+<dt>target_cols (optional) </dt>
+<dd><p class="startdd">TEXT, default: '*'. A comma-separated list of the columns to correlate. If NULL or <code>'*'</code>, results are produced for all numeric columns.</p>
+<p class="enddd"></p>
+</dd>
+<dt>verbose (optional) </dt>
+<dd><p class="startdd">BOOLEAN, default: FALSE. Print verbose debugging information if TRUE.</p>
+<p class="enddd"></p>
+</dd>
+</dl>
+<p><a class="anchor" id="examples"></a></p><dl class="section user"><dt>Examples</dt><dd></dd></dl>
+<ol type="1">
+<li>View online help for the correlation function. <pre class="example">
+SELECT madlib.correlation();
+</pre></li>
+<li>Create an input data set. <pre class="example">
+DROP TABLE IF EXISTS example_data;
+CREATE TABLE example_data(
+    id SERIAL, outlook TEXT,
+    temperature FLOAT8, humidity FLOAT8,
+    windy TEXT, class TEXT);
+INSERT INTO example_data VALUES
+(1, 'sunny', 85, 85, 'false', 'Dont Play'),
+(2, 'sunny', 80, 90, 'true', 'Dont Play'),
+(3, 'overcast', 83, 78, 'false', 'Play'),
+(4, 'rain', 70, 96, 'false', 'Play'),
+(5, 'rain', 68, 80, 'false', 'Play'),
+(6, 'rain', 65, 70, 'true', 'Dont Play'),
+(7, 'overcast', 64, 65, 'true', 'Play'),
+(8, 'sunny', 72, 95, 'false', 'Dont Play'),
+(9, 'sunny', 69, 70, 'false', 'Play'),
+(10, 'rain', 75, 80, 'false', 'Play'),
+(11, 'sunny', 75, 70, 'true', 'Play'),
+(12, 'overcast', 72, 90, 'true', 'Play'),
+(13, 'overcast', 81, 75, 'false', 'Play'),
+(14, 'rain', 71, 80, 'true', 'Dont Play'),
+(15, NULL, 100, 100, 'true', NULL),
+(16, NULL, 110, 100, 'true', NULL);
+</pre></li>
+<li>Run the <a class="el" href="correlation_8sql__in.html#ada17a10ea8a6c4580e7413c86ae5345e">correlation()</a> function on the data set. <pre class="example">
+-- Correlate all numeric columns
+SELECT madlib.correlation( 'example_data',
+                           'example_data_output'
+                         );
+-- Setting target_cols to NULL or '*' also correlates all numeric columns
+SELECT madlib.correlation( 'example_data',
+                           'example_data_output',
+                           '*'
+                         );
+-- Correlate only the temperature and humidity columns
+SELECT madlib.correlation( 'example_data',
+                           'example_data_output',
+                           'temperature, humidity'
+                         );
+</pre></li>
+<li>View the correlation matrix. <pre class="example">
+SELECT * FROM example_data_output ORDER BY column_position;
+</pre> Result: <pre class="result">
+ column_position |  variable   |    temperature    | humidity
+-----------------+-------------+-------------------+----------
+               1 | temperature |               1.0 |
+               2 | humidity    | 0.616876934548786 |      1.0
+(2 rows)
+</pre></li>
+<li>Compute the covariance of features in the data set. <pre class="example">
+SELECT madlib.covariance( 'example_data',
+                          'cov_output'
+                         );
+</pre></li>
+<li>View the covariance matrix. <pre class="example">
+SELECT * FROM cov_output ORDER BY column_position;
+</pre> Result: <pre class="result">
+ column_position |  variable   |    temperature    | humidity
+-----------------+-------------+-------------------+----------
+               1 | temperature |      146.25       |
+               2 | humidity    |      82.125       | 121.1875
+(2 rows)
+</pre></li>
+</ol>
+<dl class="section user"><dt>Notes</dt><dd>Current implementation ignores a row that contains NULL entirely. This means any correlation in such a row (with NULLs) does not contribute to the final answer.</dd></dl>
+<p><a class="anchor" id="related"></a></p><dl class="section user"><dt>Related Topics</dt><dd></dd></dl>
+<p>File <a class="el" href="correlation_8sql__in.html" title="SQL functions for correlation computation. ">correlation.sql_in</a> documenting the SQL functions</p>
+<p><a class="el" href="group__grp__summary.html">Summary</a> for general descriptive statistics for a table </p>
+</div><!-- contents -->
+</div><!-- doc-content -->
+<!-- start footer part -->
+<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
+  <ul>
+    <li class="footer">Generated on Tue May 16 2017 13:24:38 for MADlib by
+    <a href="http://www.doxygen.org/index.html">
+    <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.13 </li>
+  </ul>
+</div>
+</body>
+</html>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/b5b51c69/docs/v1.11/group__grp__countmin.html
----------------------------------------------------------------------
diff --git a/docs/v1.11/group__grp__countmin.html b/docs/v1.11/group__grp__countmin.html
new file mode 100644
index 0000000..b196a9a
--- /dev/null
+++ b/docs/v1.11/group__grp__countmin.html
@@ -0,0 +1,258 @@
+<!-- HTML header for doxygen 1.8.4-->
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+<head>
+<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
+<meta http-equiv="X-UA-Compatible" content="IE=9"/>
+<meta name="generator" content="Doxygen 1.8.13"/>
+<meta name="keywords" content="madlib,postgres,greenplum,machine learning,data mining,deep learning,ensemble methods,data science,market basket analysis,affinity analysis,pca,lda,regression,elastic net,huber white,proportional hazards,k-means,latent dirichlet allocation,bayes,support vector machines,svm"/>
+<title>MADlib: CountMin (Cormode-Muthukrishnan)</title>
+<link href="tabs.css" rel="stylesheet" type="text/css"/>
+<script type="text/javascript" src="jquery.js"></script>
+<script type="text/javascript" src="dynsections.js"></script>
+<link href="navtree.css" rel="stylesheet" type="text/css"/>
+<script type="text/javascript" src="resize.js"></script>
+<script type="text/javascript" src="navtreedata.js"></script>
+<script type="text/javascript" src="navtree.js"></script>
+<script type="text/javascript">
+  $(document).ready(initResizable);
+</script>
+<link href="search/search.css" rel="stylesheet" type="text/css"/>
+<script type="text/javascript" src="search/searchdata.js"></script>
+<script type="text/javascript" src="search/search.js"></script>
+<script type="text/javascript">
+  $(document).ready(function() { init_search(); });
+</script>
+<!-- hack in the navigation tree -->
+<script type="text/javascript" src="eigen_navtree_hacks.js"></script>
+<link href="doxygen.css" rel="stylesheet" type="text/css" />
+<link href="madlib_extra.css" rel="stylesheet" type="text/css"/>
+<!-- google analytics -->
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.incubator.apache.org');
+  ga('send', 'pageview');
+</script>
+</head>
+<body>
+<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
+<div id="titlearea">
+<table cellspacing="0" cellpadding="0">
+ <tbody>
+ <tr style="height: 56px;">
+  <td id="projectlogo"><a href="http://madlib.incubator.apache.org"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
+  <td style="padding-left: 0.5em;">
+   <div id="projectname">
+   <span id="projectnumber">1.11</span>
+   </div>
+   <div id="projectbrief">User Documentation for MADlib</div>
+  </td>
+   <td>        <div id="MSearchBox" class="MSearchBoxInactive">
+        <span class="left">
+          <img id="MSearchSelect" src="search/mag_sel.png"
+               onmouseover="return searchBox.OnSearchSelectShow()"
+               onmouseout="return searchBox.OnSearchSelectHide()"
+               alt=""/>
+          <input type="text" id="MSearchField" value="Search" accesskey="S"
+               onfocus="searchBox.OnSearchFieldFocus(true)" 
+               onblur="searchBox.OnSearchFieldFocus(false)" 
+               onkeyup="searchBox.OnSearchFieldChange(event)"/>
+          </span><span class="right">
+            <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
+          </span>
+        </div>
+</td>
+ </tr>
+ </tbody>
+</table>
+</div>
+<!-- end header part -->
+<!-- Generated by Doxygen 1.8.13 -->
+<script type="text/javascript">
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+</script>
+</div><!-- top -->
+<div id="side-nav" class="ui-resizable side-nav-resizable">
+  <div id="nav-tree">
+    <div id="nav-tree-contents">
+      <div id="nav-sync" class="sync"></div>
+    </div>
+  </div>
+  <div id="splitbar" style="-moz-user-select:none;" 
+       class="ui-resizable-handle">
+  </div>
+</div>
+<script type="text/javascript">
+$(document).ready(function(){initNavTree('group__grp__countmin.html','');});
+</script>
+<div id="doc-content">
+<!-- window showing the filter options -->
+<div id="MSearchSelectWindow"
+     onmouseover="return searchBox.OnSearchSelectShow()"
+     onmouseout="return searchBox.OnSearchSelectHide()"
+     onkeydown="return searchBox.OnSearchSelectKey(event)">
+</div>
+
+<!-- iframe showing the search results (closed by default) -->
+<div id="MSearchResultsWindow">
+<iframe src="javascript:void(0)" frameborder="0" 
+        name="MSearchResults" id="MSearchResults">
+</iframe>
+</div>
+
+<div class="header">
+  <div class="headertitle">
+<div class="title">CountMin (Cormode-Muthukrishnan)<div class="ingroups"><a class="el" href="group__grp__early__stage.html">Early Stage Development</a> &raquo; <a class="el" href="group__grp__sketches.html">Cardinality Estimators</a></div></div>  </div>
+</div><!--header-->
+<div class="contents">
+<div class="toc"><b>Contents</b> <ul>
+<li>
+<a href="#syntax">Syntax</a> </li>
+<li>
+<a href="#examples">Examples</a> </li>
+<li>
+<a href="#literature">Literature</a> </li>
+<li>
+<a href="#related">Related Topics</a> </li>
+</ul>
+</div><dl class="section warning"><dt>Warning</dt><dd><em> This MADlib method is still in early stage development. There may be some issues that will be addressed in a future version. Interface and implementation is subject to change. </em></dd></dl>
+<p>This module implements Cormode-Muthukrishnan <em>CountMin</em> sketches on integer values, implemented as a user-defined aggregate. It also provides scalar functions over the sketches to produce approximate counts, order statistics, and histograms.</p>
+<p><a class="anchor" id="syntax"></a></p><dl class="section user"><dt>Syntax</dt><dd><ul>
+<li>Get a sketch of a selected column specified by <em>col_name</em>. <pre class="syntax">
+cmsketch( col_name )
+</pre></li>
+<li>Get the number of rows where <em>col_name = p</em>, computed from the sketch obtained from <code>cmsketch</code>. <pre class="syntax">
+cmsketch_count( cmsketch,
+                p
+              )
+</pre></li>
+<li>Get the number of rows where <em>col_name</em> is between <em>m</em> and <em>n</em> inclusive. <pre class="syntax">
+cmsketch_rangecount( cmsketch,
+                     m,
+                     n
+                   )
+</pre></li>
+<li>Get the <em>k</em>th percentile of <em>col_name</em> where <em>count</em> specifies number of rows. <em>k</em> should be an integer between 1 to 99. <pre class="syntax">
+cmsketch_centile( cmsketch,
+                  k,
+                  count
+                )
+</pre></li>
+<li>Get the median of col_name where <em>count</em> specifies number of rows. This is equivalent to <code><a class="el" href="sketch_8sql__in.html#a2f2ab2fe3244515f5f73d49690e73b39">cmsketch_centile</a>(<em>cmsketch</em>,50,<em>count</em>)</code>. <pre class="syntax">
+cmsketch_median( cmsketch,
+                 count
+               )
+</pre></li>
+<li>Get an n-bucket histogram for values between min and max for the column where each bucket has approximately the same width. The output is a text string containing triples {lo, hi, count} representing the buckets; counts are approximate. <pre class="syntax">
+cmsketch_width_histogram( cmsketch,
+                          min,
+                          max,
+                          n
+                        )
+</pre></li>
+<li>Get an n-bucket histogram for the column where each bucket has approximately the same count. The output is a text string containing triples {lo, hi, count} representing the buckets; counts are approximate. Note that an equi-depth histogram is equivalent to a spanning set of equi-spaced centiles. <pre class="syntax">
+cmsketch_depth_histogram( cmsketch,
+                          n
+                        )
+</pre></li>
+</ul>
+</dd></dl>
+<p><a class="anchor" id="examples"></a></p><dl class="section user"><dt>Examples</dt><dd></dd></dl>
+<ol type="1">
+<li>Generate some data. <pre class="example">
+CREATE TABLE data(class INT, a1 INT);
+INSERT INTO data SELECT 1,1 FROM generate_series(1,10000);
+INSERT INTO data SELECT 1,2 FROM generate_series(1,15000);
+INSERT INTO data SELECT 1,3 FROM generate_series(1,10000);
+INSERT INTO data SELECT 2,5 FROM generate_series(1,1000);
+INSERT INTO data SELECT 2,6 FROM generate_series(1,1000);
+</pre></li>
+<li>Count number of rows where a1 = 2 in each class. <pre class="example">
+SELECT class,
+       cmsketch_count(
+                       cmsketch( a1 ),
+                       2
+                      )
+FROM data GROUP BY data.class;
+</pre> Result: <pre class="result">
+ class | cmsketch_count
+&#160;------+----------------
+     2 |              0
+     1 |          15000
+(2 rows)
+</pre></li>
+<li>Count number of rows where a1 is between 3 and 6. <pre class="example">
+SELECT class,
+       cmsketch_rangecount(
+                            cmsketch(a1),
+                            3,
+                            6
+                          )
+FROM data GROUP BY data.class;
+</pre> Result: <pre class="result">
+ class | cmsketch_rangecount
+&#160;------+---------------------
+     2 |                2000
+     1 |               10000
+(2 rows)
+</pre></li>
+<li>Compute the 90th percentile of all of a1. <pre class="example">
+SELECT cmsketch_centile(
+                         cmsketch( a1 ),
+                         90,
+                         count(*)
+                       )
+FROM data;
+</pre> Result: <pre class="result">
+ cmsketch_centile
+&#160;-----------------
+                3
+(1 row)
+</pre></li>
+<li>Produce an equi-width histogram with 2 bins between 0 and 10. <pre class="example">
+SELECT cmsketch_width_histogram(
+                                 cmsketch( a1 ),
+                                 0,
+                                 10,
+                                 2
+                               )
+FROM data;
+</pre> Result: <pre class="result">
+      cmsketch_width_histogram
+&#160;-----------------------------------
+ [[0L, 4L, 35000], [5L, 10L, 2000]]
+(1 row)
+</pre></li>
+<li>Produce an equi-depth histogram of a1 with 2 bins of approximately equal depth. <pre class="example">
+SELECT cmsketch_depth_histogram(
+                                 cmsketch( a1 ),
+                                 2
+                               )
+FROM data;
+</pre> Result: <pre class="result">
+                       cmsketch_depth_histogram
+&#160;----------------------------------------------------------------------
+ [[-9223372036854775807L, 1, 10000], [2, 9223372036854775807L, 27000]]
+(1 row)
+</pre></li>
+</ol>
+<p><a class="anchor" id="literature"></a></p><dl class="section user"><dt>Literature</dt><dd></dd></dl>
+<p>[1] G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. LATIN 2004, J. Algorithm 55(1): 58-75 (2005) . <a href="http://dimacs.rutgers.edu/~graham/pubs/html/CormodeMuthukrishnan04CMLatin.html">http://dimacs.rutgers.edu/~graham/pubs/html/CormodeMuthukrishnan04CMLatin.html</a></p>
+<p>[2] G. Cormode. Encyclopedia entry on 'Count-Min Sketch'. In L. Liu and M. T. Ozsu, editors, Encyclopedia of Database Systems, pages 511-516. Springer, 2009. <a href="http://dimacs.rutgers.edu/~graham/pubs/html/Cormode09b.html">http://dimacs.rutgers.edu/~graham/pubs/html/Cormode09b.html</a></p>
+<p><a class="anchor" id="related"></a>File <a class="el" href="sketch_8sql__in.html" title="SQL functions for sketch-based approximations of descriptive statistics. ">sketch.sql_in</a> documenting the SQL functions.</p>
+<p>Module grp_quantile for a different implementation of quantile function. </p>
+</div><!-- contents -->
+</div><!-- doc-content -->
+<!-- start footer part -->
+<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
+  <ul>
+    <li class="footer">Generated on Tue May 16 2017 13:24:39 for MADlib by
+    <a href="http://www.doxygen.org/index.html">
+    <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.13 </li>
+  </ul>
+</div>
+</body>
+</html>

http://git-wip-us.apache.org/repos/asf/incubator-madlib-site/blob/b5b51c69/docs/v1.11/group__grp__cox__prop__hazards.html
----------------------------------------------------------------------
diff --git a/docs/v1.11/group__grp__cox__prop__hazards.html b/docs/v1.11/group__grp__cox__prop__hazards.html
new file mode 100644
index 0000000..60f31d9
--- /dev/null
+++ b/docs/v1.11/group__grp__cox__prop__hazards.html
@@ -0,0 +1,469 @@
+<!-- HTML header for doxygen 1.8.4-->
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml">
+<head>
+<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
+<meta http-equiv="X-UA-Compatible" content="IE=9"/>
+<meta name="generator" content="Doxygen 1.8.13"/>
+<meta name="keywords" content="madlib,postgres,greenplum,machine learning,data mining,deep learning,ensemble methods,data science,market basket analysis,affinity analysis,pca,lda,regression,elastic net,huber white,proportional hazards,k-means,latent dirichlet allocation,bayes,support vector machines,svm"/>
+<title>MADlib: Cox-Proportional Hazards Regression</title>
+<link href="tabs.css" rel="stylesheet" type="text/css"/>
+<script type="text/javascript" src="jquery.js"></script>
+<script type="text/javascript" src="dynsections.js"></script>
+<link href="navtree.css" rel="stylesheet" type="text/css"/>
+<script type="text/javascript" src="resize.js"></script>
+<script type="text/javascript" src="navtreedata.js"></script>
+<script type="text/javascript" src="navtree.js"></script>
+<script type="text/javascript">
+  $(document).ready(initResizable);
+</script>
+<link href="search/search.css" rel="stylesheet" type="text/css"/>
+<script type="text/javascript" src="search/searchdata.js"></script>
+<script type="text/javascript" src="search/search.js"></script>
+<script type="text/javascript">
+  $(document).ready(function() { init_search(); });
+</script>
+<!-- hack in the navigation tree -->
+<script type="text/javascript" src="eigen_navtree_hacks.js"></script>
+<link href="doxygen.css" rel="stylesheet" type="text/css" />
+<link href="madlib_extra.css" rel="stylesheet" type="text/css"/>
+<!-- google analytics -->
+<script>
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.incubator.apache.org');
+  ga('send', 'pageview');
+</script>
+</head>
+<body>
+<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
+<div id="titlearea">
+<table cellspacing="0" cellpadding="0">
+ <tbody>
+ <tr style="height: 56px;">
+  <td id="projectlogo"><a href="http://madlib.incubator.apache.org"><img alt="Logo" src="madlib.png" height="50" style="padding-left:0.5em;" border="0"/ ></a></td>
+  <td style="padding-left: 0.5em;">
+   <div id="projectname">
+   <span id="projectnumber">1.11</span>
+   </div>
+   <div id="projectbrief">User Documentation for MADlib</div>
+  </td>
+   <td>        <div id="MSearchBox" class="MSearchBoxInactive">
+        <span class="left">
+          <img id="MSearchSelect" src="search/mag_sel.png"
+               onmouseover="return searchBox.OnSearchSelectShow()"
+               onmouseout="return searchBox.OnSearchSelectHide()"
+               alt=""/>
+          <input type="text" id="MSearchField" value="Search" accesskey="S"
+               onfocus="searchBox.OnSearchFieldFocus(true)" 
+               onblur="searchBox.OnSearchFieldFocus(false)" 
+               onkeyup="searchBox.OnSearchFieldChange(event)"/>
+          </span><span class="right">
+            <a id="MSearchClose" href="javascript:searchBox.CloseResultsWindow()"><img id="MSearchCloseImg" border="0" src="search/close.png" alt=""/></a>
+          </span>
+        </div>
+</td>
+ </tr>
+ </tbody>
+</table>
+</div>
+<!-- end header part -->
+<!-- Generated by Doxygen 1.8.13 -->
+<script type="text/javascript">
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+</script>
+</div><!-- top -->
+<div id="side-nav" class="ui-resizable side-nav-resizable">
+  <div id="nav-tree">
+    <div id="nav-tree-contents">
+      <div id="nav-sync" class="sync"></div>
+    </div>
+  </div>
+  <div id="splitbar" style="-moz-user-select:none;" 
+       class="ui-resizable-handle">
+  </div>
+</div>
+<script type="text/javascript">
+$(document).ready(function(){initNavTree('group__grp__cox__prop__hazards.html','');});
+</script>
+<div id="doc-content">
+<!-- window showing the filter options -->
+<div id="MSearchSelectWindow"
+     onmouseover="return searchBox.OnSearchSelectShow()"
+     onmouseout="return searchBox.OnSearchSelectHide()"
+     onkeydown="return searchBox.OnSearchSelectKey(event)">
+</div>
+
+<!-- iframe showing the search results (closed by default) -->
+<div id="MSearchResultsWindow">
+<iframe src="javascript:void(0)" frameborder="0" 
+        name="MSearchResults" id="MSearchResults">
+</iframe>
+</div>
+
+<div class="header">
+  <div class="headertitle">
+<div class="title">Cox-Proportional Hazards Regression<div class="ingroups"><a class="el" href="group__grp__super.html">Supervised Learning</a> &raquo; <a class="el" href="group__grp__regml.html">Regression Models</a></div></div>  </div>
+</div><!--header-->
+<div class="contents">
+<div class="toc"><b>Contents</b> <ul>
+<li class="level1">
+<a href="#training">Training Function</a> </li>
+<li class="level1">
+<a href="#cox_zph">PHA Test Function</a> </li>
+<li class="level1">
+<a href="#predict">Prediction Function</a> </li>
+<li class="level1">
+<a href="#examples">Examples</a> </li>
+<li class="level1">
+<a href="#background">Technical Background</a> </li>
+<li class="level1">
+<a href="#related">Related Topics</a> </li>
+</ul>
+</div><p>Proportional-Hazard models enable the comparison of various survival models. These survival models are functions describing the probability of a one-item event (prototypically, this event is death) with respect to time. The interval of time before the occurrence of death can be called the survival time. Let T be a random variable representing the survival time, with a cumulative probability function P(t). Informally, P(t) is the probability that death has happened before time t.</p>
+<p><a class="anchor" id="training"></a></p><dl class="section user"><dt>Training Function</dt><dd></dd></dl>
+<p>Following is the syntax for the <a class="el" href="cox__prop__hazards_8sql__in.html#a737450bbfe0f10204b0074a9d45b0cef" title="Compute cox-regression coefficients and diagnostic statistics. ">coxph_train()</a> training function: </p><pre class="syntax">
+coxph_train( source_table,
+             output_table,
+             dependent_variable,
+             independent_variable,
+             right_censoring_status,
+             strata,
+             optimizer_params
+           )
+</pre><p> <b>Arguments</b> </p><dl class="arglist">
+<dt>source_table </dt>
+<dd>TEXT. The name of the table containing input data. </dd>
+<dt>output_table </dt>
+<dd><p class="startdd">TEXT. The name of the table where the output model is saved. The output is saved in the table named by the <em>output_table</em> argument. It has the following columns: </p><table class="output">
+<tr>
+<th>coef </th><td>FLOAT8[]. Vector of the coefficients.  </td></tr>
+<tr>
+<th>loglikelihood </th><td>FLOAT8. Log-likelihood value of the MLE estimate.  </td></tr>
+<tr>
+<th>std_err </th><td>FLOAT8[]. Vector of the standard error of the coefficients.  </td></tr>
+<tr>
+<th>stats </th><td>FLOAT8[]. Vector of the statistics of the coefficients.  </td></tr>
+<tr>
+<th>p_values </th><td>FLOAT8[]. Vector of the p-values of the coefficients.  </td></tr>
+<tr>
+<th>hessian </th><td>FLOAT8[]. The Hessian matrix computed using the final solution.  </td></tr>
+<tr>
+<th>num_iterations </th><td>INTEGER. The number of iterations performed by the optimizer.  </td></tr>
+</table>
+<p>Additionally, a summary output table is generated that contains a summary of the parameters used for building the Cox model. It is stored in a table named &lt;output_table&gt;_summary. It has the following columns: </p><table class="output">
+<tr>
+<th>source_table </th><td>The source table name.  </td></tr>
+<tr>
+<th>dependent_variable </th><td>The dependent variable name.  </td></tr>
+<tr>
+<th>independent_variable </th><td>The independent variable name.  </td></tr>
+<tr>
+<th>right_censoring_status </th><td>The right censoring status  </td></tr>
+<tr>
+<th>strata </th><td>The stratification columns  </td></tr>
+<tr>
+<th>num_processed </th><td>The number of rows that were actually used in the computation.  </td></tr>
+<tr>
+<th>num_missing_rows_skipped </th><td>The number of rows that were skipped in the computation due to NULL values in them.  </td></tr>
+</table>
+<p class="enddd"></p>
+</dd>
+<dt>dependent_variable </dt>
+<dd>TEXT. A string containing the name of a column that contains an array of numeric values, or a string expression in the format 'ARRAY[1, x1, x2, x3]', where <em>x1</em>, <em>x2</em> and <em>x3</em> are column names. Dependent variables refer to the time of death. There is no need to pre-sort the data. </dd>
+<dt>independent_variable </dt>
+<dd>TEXT. The name of the independent variable. </dd>
+<dt>right_censoring_status (optional) </dt>
+<dd>TEXT, default: TRUE for all observations. A string containing an expression that evaluates to the right-censoring status for the observation&mdash;TRUE if the observation is not censored and FALSE if the observation is censored. The string could contain the name of the column containing the right-censoring status, a fixed Boolean expression (i.e., 'true', 'false', '0', '1') that applies to all observations, or a Boolean expression such as 'column_name &lt; 10' that can be evaluated for each observation. </dd>
+<dt>strata (optional) </dt>
+<dd>VARCHAR, default: NULL, which does not do any stratifications. A string of comma-separated column names that are the strata ID variables used to do stratification. </dd>
+<dt>optimizer_params (optional) </dt>
+<dd><p class="startdd">VARCHAR, default: NULL, which uses the default values of optimizer parameters: max_iter=100, optimizer=newton, tolerance=1e-8, array_agg_size=10000000, sample_size=1000000. It should be a string that contains 'key=value' pairs separated by commas. The meanings of these parameters are:</p>
+<ul>
+<li>max_iter &mdash; The maximum number of iterations. The computation stops if the number of iterations exceeds this, which usually means that there is no convergence.</li>
+<li>optimizer &mdash; The optimization method. Right now, "newton" is the only one supported.</li>
+<li>tolerance &mdash; The stopping criteria. When the difference between the log-likelihoods of two consecutive iterations is smaller than this number, the computation has already converged and stops.</li>
+<li>array_agg_size &mdash; To speed up the computation, the original data table is cut into multiple pieces, and each pieces of the data is aggregated into one big row. In the process of computation, the whole big row is loaded into memory and thus speed up the computation. This parameter controls approximately how many numbers we want to put into one big row. Larger value of array_agg_size may speed up more, but the size of the big row cannot exceed 1GB due to the restriction of PostgreSQL databases.</li>
+<li>sample_size &mdash; To cut the data into approximate equal pieces, we first sample the data, and then find out the break points using this sampled data. A larger sample_size produces more accurate break points.  </li>
+</ul>
+</dd>
+</dl>
+<p><a class="anchor" id="cox_zph"></a></p><dl class="section user"><dt>Proportional Hazards Assumption Test Function</dt><dd></dd></dl>
+<p>The <a class="el" href="cox__prop__hazards_8sql__in.html#a682d95d5475ce33e47937067cadc2766" title="Test the proportional hazards assumption for a Cox regression model fit (coxph_train) ...">cox_zph()</a> function tests the proportional hazards assumption (PHA) of a Cox regression.</p>
+<p>Proportional-hazard models enable the comparison of various survival models. These PH models, however, assume that the hazard for a given individual is a fixed proportion of the hazard for any other individual, and the ratio of the hazards is constant across time. MADlib does not currently have support for performing any transformation of the time to compute the correlation.</p>
+<p>The <a class="el" href="cox__prop__hazards_8sql__in.html#a682d95d5475ce33e47937067cadc2766" title="Test the proportional hazards assumption for a Cox regression model fit (coxph_train) ...">cox_zph()</a> function is used to test this assumption by computing the correlation of the residual of the <a class="el" href="cox__prop__hazards_8sql__in.html#a737450bbfe0f10204b0074a9d45b0cef" title="Compute cox-regression coefficients and diagnostic statistics. ">coxph_train()</a> model with time.</p>
+<p>Following is the syntax for the <a class="el" href="cox__prop__hazards_8sql__in.html#a682d95d5475ce33e47937067cadc2766" title="Test the proportional hazards assumption for a Cox regression model fit (coxph_train) ...">cox_zph()</a> function: </p><pre class="syntax">
+cox_zph(cox_model_table, output_table)
+</pre><p> <b>Arguments</b> </p><dl class="arglist">
+<dt>cox_model_table </dt>
+<dd><p class="startdd">TEXT. The name of the table containing the Cox Proportional-Hazards model.</p>
+<p class="enddd"></p>
+</dd>
+<dt>output_table </dt>
+<dd>TEXT. The name of the table where the test statistics are saved. The output table is named by the <em>output_table</em> argument and has the following columns: <table class="output">
+<tr>
+<th>covariate </th><td>TEXT. The independent variables.  </td></tr>
+<tr>
+<th>rho </th><td>FLOAT8[]. Vector of the correlation coefficients between survival time and the scaled Schoenfeld residuals.  </td></tr>
+<tr>
+<th>chi_square </th><td>FLOAT8[]. Chi-square test statistic for the correlation analysis.  </td></tr>
+<tr>
+<th>p_value </th><td>FLOAT8[]. Two-side p-value for the chi-square statistic.  </td></tr>
+</table>
+</dd>
+</dl>
+<p>Additionally, the residual values are outputted to the table named <em>output_table</em>_residual. The table contains the following columns: </p><table class="output">
+<tr>
+<th>&lt;dep_column_name&gt; </th><td>FLOAT8. Time values (dependent variable) present in the original source table.   </td></tr>
+<tr>
+<th>residual </th><td>FLOAT8[]. Difference between the original covariate values and the expectation of the covariates obtained from the coxph_train model.  </td></tr>
+<tr>
+<th>scaled_residual </th><td>Residual values scaled by the variance of the coefficients.  </td></tr>
+</table>
+<p><a class="anchor" id="notes"></a></p><dl class="section user"><dt>Notes</dt><dd></dd></dl>
+<ul>
+<li>Table names can be optionally schema qualified (current_schemas() is used if a schema name is not provided) and table and column names should follow case-sensitivity and quoting rules per the database. For instance, 'mytable' and 'MyTable' both resolve to the same entity&mdash;'mytable'. If mixed-case or multi-byte characters are desired for entity names then the string should be double-quoted; in this case the input would be '"MyTable"'.</li>
+<li>The <a class="el" href="cox__prop__hazards_8sql__in.html#a3310cf98478b7c1e400e8fb1b3965d30">cox_prop_hazards_regr()</a> and <a class="el" href="cox__prop__hazards_8sql__in.html#ad778b289eb19ae0bb2b7e02a89bab3bc" title="Cox regression training function. ">cox_prop_hazards()</a> functions have been deprecated; <a class="el" href="cox__prop__hazards_8sql__in.html#a737450bbfe0f10204b0074a9d45b0cef" title="Compute cox-regression coefficients and diagnostic statistics. ">coxph_train()</a> should be used instead.</li>
+</ul>
+<p><a class="anchor" id="predict"></a></p><dl class="section user"><dt>Prediction Function</dt><dd>The prediction function is provided to calculate the linear predictionors, risk or the linear terms for the given prediction data. It has the following syntax: <pre class="syntax">
+coxph_predict(model_table,
+              source_table,
+              id_col_name,
+              output_table,
+              pred_type,
+              reference)
+</pre></dd></dl>
+<p><b>Arguments</b> </p><dl class="arglist">
+<dt>model_table </dt>
+<dd><p class="startdd">TEXT. Name of the table containing the cox model.</p>
+<p class="enddd"></p>
+</dd>
+<dt>source_table </dt>
+<dd><p class="startdd">TEXT. Name of the table containing the prediction data.</p>
+<p class="enddd"></p>
+</dd>
+<dt>id_col_name </dt>
+<dd><p class="startdd">TEXT. Name of the id column in the source table.</p>
+<p class="enddd"></p>
+</dd>
+<dt>output_table </dt>
+<dd><p class="startdd">TEXT. Name of the table to store the prediction results in. The output table is named by the <em>output_table</em> argument and has the following columns: </p><table class="output">
+<tr>
+<th>id </th><td>TEXT. The id column name from the source table.  </td></tr>
+<tr>
+<th>predicted_result </th><td>DOUBLE PRECISION. Result of prediction based of the value of the prediction type parameter.  </td></tr>
+</table>
+<p class="enddd"></p>
+</dd>
+<dt>pred_type </dt>
+<dd><p class="startdd">TEXT, OPTIONAL. Type of prediction. This can be one of 'linear_predictors', 'risk', or 'terms'. DEFAULT='linear_predictors'.</p><ul>
+<li>'linear_predictors' calculates the dot product of the independent variables and the coefficients.</li>
+<li>'risk' is the exponentiated value of the linear prediction.</li>
+<li>'terms' correspond to the linear terms obtained by multiplying the independent variables with their corresponding coefficients values (without further calculating the sum of these terms) </li>
+</ul>
+<p class="enddd"></p>
+</dd>
+<dt>reference </dt>
+<dd>TEXT, OPTIONAL. Reference level to use for centering predictions. Can be one of 'strata', 'overall'. DEFAULT='strata'. Note that R uses 'sample' instead of 'overall' when referring to the overall mean value of the covariates as being the reference level. </dd>
+</dl>
+<p><a class="anchor" id="examples"></a></p><dl class="section user"><dt>Examples</dt><dd></dd></dl>
+<ol type="1">
+<li>View online help for the proportional hazards training method. <pre class="example">
+SELECT madlib.coxph_train();
+</pre></li>
+<li>Create an input data set. <pre class="example">
+DROP TABLE IF EXISTS sample_data;
+CREATE TABLE sample_data (
+    id INTEGER NOT NULL,
+    grp DOUBLE PRECISION,
+    wbc DOUBLE PRECISION,
+    timedeath INTEGER,
+    status BOOLEAN
+);
+COPY sample_data FROM STDIN WITH DELIMITER '|';
+  0 |   0 | 1.45 |        35 | t
+  1 |   0 | 1.47 |        34 | t
+  3 |   0 |  2.2 |        32 | t
+  4 |   0 | 1.78 |        25 | t
+  5 |   0 | 2.57 |        23 | t
+  6 |   0 | 2.32 |        22 | t
+  7 |   0 | 2.01 |        20 | t
+  8 |   0 | 2.05 |        19 | t
+  9 |   0 | 2.16 |        17 | t
+ 10 |   0 |  3.6 |        16 | t
+ 11 |   1 |  2.3 |        15 | t
+ 12 |   0 | 2.88 |        13 | t
+ 13 |   1 |  1.5 |        12 | t
+ 14 |   0 |  2.6 |        11 | t
+ 15 |   0 |  2.7 |        10 | t
+ 16 |   0 |  2.8 |         9 | t
+ 17 |   1 | 2.32 |         8 | t
+ 18 |   0 | 4.43 |         7 | t
+ 19 |   0 | 2.31 |         6 | t
+ 20 |   1 | 3.49 |         5 | t
+ 21 |   1 | 2.42 |         4 | t
+ 22 |   1 | 4.01 |         3 | t
+ 23 |   1 | 4.91 |         2 | t
+ 24 |   1 |    5 |         1 | t
+\.
+</pre></li>
+<li>Run the Cox regression function. <pre class="example">
+SELECT madlib.coxph_train( 'sample_data',
+                           'sample_cox',
+                           'timedeath',
+                           'ARRAY[grp,wbc]',
+                           'status'
+                         );
+</pre></li>
+<li>View the results of the regression. <pre class="example">
+\x on
+SELECT * FROM sample_cox;
+</pre> Results: <pre class="result">
+-[ RECORD 1 ]--+----------------------------------------------------------------------------
+coef           | {2.54407073265254,1.67172094779487}
+loglikelihood  | -37.8532498733
+std_err        | {0.677180599294897,0.387195514577534}
+z_stats        | {3.7568570855419,4.31751114064138}
+p_values       | {0.000172060691513886,1.5779844638453e-05}
+hessian        | {{2.78043065745617,-2.25848560642414},{-2.25848560642414,8.50472838284472}}
+num_iterations | 5
+</pre></li>
+<li>Computing predictions using cox model. (This example uses the original data table to perform the prediction. Typically a different test dataset with the same features as the original training dataset would be used.) <pre class="example">
+\x off
+-- Display the linear predictors for the original dataset
+SELECT madlib.coxph_predict('sample_cox',
+                            'sample_data',
+                            'id',
+                            'sample_pred');
+</pre> <pre class="result">
+SELECT * FROM sample_pred;
+ id |  predicted_value
+----+--------------------
+  0 |  -2.97110918125034
+  4 |  -2.41944126847803
+  6 |   -1.5167119566688
+  8 |  -1.96807661257341
+ 10 |  0.623090856508638
+ 12 |  -0.58054822590367
+ 14 |  -1.04863009128623
+ 16 | -0.714285901727259
+ 18 |   2.01061924317838
+ 20 |   2.98327228490375
+ 22 |   3.85256717775708
+ 24 |     5.507570916074
+  1 |  -2.93767476229444
+  3 |  -1.71731847040418
+  5 |  -1.09878171972008
+  7 |  -2.03494545048521
+  9 |  -1.78418730831598
+ 15 | -0.881457996506747
+ 19 |  -1.53342916614675
+ 11 |  0.993924357027849
+ 13 | -0.343452401208048
+ 17 |   1.02735877598375
+ 21 |   1.19453087076323
+ 23 |   5.35711603077246
+(24 rows)
+</pre> <pre class="example">
+-- Display the relative risk for the original dataset
+SELECT madlib.coxph_predict('sample_cox',
+                            'sample_data',
+                            'id',
+                            'sample_pred',
+                            'risk');
+</pre> <pre class="result">
+ id |  predicted_value
+ ----+--------------------
+  1 | 0.0529887971503509
+  3 |  0.179546963459175
+  5 |   0.33327686110022
+  7 |  0.130687611255372
+  9 |  0.167933483703554
+ 15 |  0.414178600294289
+ 19 |  0.215794402223054
+ 11 |   2.70181658768287
+ 13 |  0.709317242984782
+ 17 |   2.79367735395696
+ 21 |   3.30200833843654
+ 23 |   212.112338046551
+  0 | 0.0512464372091503
+  4 | 0.0889713146524469
+  6 |  0.219432204682557
+  8 |  0.139725343898993
+ 10 |   1.86468261037506
+ 12 |  0.559591499901241
+ 14 |  0.350417460388247
+ 16 |  0.489541567796517
+ 18 |   7.46794038691975
+ 20 |   19.7523463121038
+ 22 |   47.1138577624204
+ 24 |   246.551504798816
+(24 rows)
+</pre></li>
+<li>Run the test for Proportional Hazards assumption to obtain correlation between residuals and time. <pre class="example">
+SELECT madlib.cox_zph( 'sample_cox',
+                       'sample_zph_output'
+                     );
+</pre></li>
+<li>View results of the PHA test. <pre class="example">
+SELECT * FROM sample_zph_output;
+</pre> Results: <pre class="result">
+-[ RECORD 1 ]-----------------------------------------
+covariate  | ARRAY[grp,wbc]
+rho        | {0.00237308357328641,0.0375600568840431}
+chi_square | {0.000100675718191977,0.0232317400546175}
+p_value    | {0.991994376850758,0.878855984657948}
+</pre></li>
+</ol>
+<p><a class="anchor" id="background"></a></p><dl class="section user"><dt>Technical Background</dt><dd></dd></dl>
+<p>Generally, proportional-hazard models start with a list of <img class="formulaInl" alt="$ \boldsymbol n $" src="form_382.png"/> observations, each with <img class="formulaInl" alt="$ \boldsymbol m $" src="form_383.png"/> covariates and a time of death. From this <img class="formulaInl" alt="$ \boldsymbol n \times m $" src="form_384.png"/> matrix, we would like to derive the correlation between the covariates and the hazard function. This amounts to finding the parameters <img class="formulaInl" alt="$ \boldsymbol \beta $" src="form_385.png"/> that best fit the model described below.</p>
+<p>Let us define:</p><ul>
+<li><img class="formulaInl" alt="$ \boldsymbol t \in \mathbf R^{m} $" src="form_386.png"/> denote the vector of observed dependent variables, with <img class="formulaInl" alt="$ n $" src="form_11.png"/> rows.</li>
+<li><img class="formulaInl" alt="$ X \in \mathbf R^{m} $" src="form_387.png"/> denote the design matrix with <img class="formulaInl" alt="$ m $" src="form_293.png"/> columns and <img class="formulaInl" alt="$ n $" src="form_11.png"/> rows, containing all observed vectors of independent variables <img class="formulaInl" alt="$ \boldsymbol x_i $" src="form_100.png"/> as rows.</li>
+<li><img class="formulaInl" alt="$ R(t_i) $" src="form_388.png"/> denote the set of observations still alive at time <img class="formulaInl" alt="$ t_i $" src="form_389.png"/></li>
+</ul>
+<p>Note that this model <b>does not</b> include a <b>constant term</b>, and the data cannot contain a column of 1s.</p>
+<p>By definition, </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ P[T_k = t_i | \boldsymbol R(t_i)] = \frac{e^{\beta^T x_k} }{ \sum_{j \in R(t_i)} e^{\beta^T x_j}}. \,. \]" src="form_390.png"/>
+</p>
+<p>The <b>partial likelihood </b>function can now be generated as the product of conditional probabilities: </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ \mathcal L = \prod_{i = 1}^n \left( \frac{e^{\beta^T x_i}}{ \sum_{j \in R(t_i)} e^{\beta^T x_j}} \right). \]" src="form_391.png"/>
+</p>
+<p>The log-likelihood form of this equation is </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ L = \sum_{i = 1}^n \left[ \beta^T x_i - \log\left(\sum_{j \in R(t_i)} e^{\beta^T x_j }\right) \right]. \]" src="form_392.png"/>
+</p>
+<p>Using this score function and Hessian matrix, the partial likelihood can be maximized using the <b> Newton-Raphson algorithm</b>. <b>Breslow's method</b> is used to resolved tied times of deaths. The time of death for two records are considered "equal" if they differ by less than 1.0e-6</p>
+<p>The inverse of the Hessian matrix, evaluated at the estimate of <img class="formulaInl" alt="$ \boldsymbol \beta $" src="form_385.png"/>, can be used as an <b>approximate variance-covariance matrix </b> for the estimate, and used to produce approximate <b>standard errors</b> for the regression coefficients.</p>
+<p class="formulaDsp">
+<img class="formulaDsp" alt="\[ \mathit{se}(c_i) = \left( (H)^{-1} \right)_{ii} \,. \]" src="form_393.png"/>
+</p>
+<p> The Wald z-statistic is </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ z_i = \frac{c_i}{\mathit{se}(c_i)} \,. \]" src="form_110.png"/>
+</p>
+<p>The Wald <img class="formulaInl" alt="$ p $" src="form_111.png"/>-value for coefficient <img class="formulaInl" alt="$ i $" src="form_33.png"/> gives the probability (under the assumptions inherent in the Wald test) of seeing a value at least as extreme as the one observed, provided that the null hypothesis ( <img class="formulaInl" alt="$ c_i = 0 $" src="form_112.png"/>) is true. Letting <img class="formulaInl" alt="$ F $" src="form_113.png"/> denote the cumulative density function of a standard normal distribution, the Wald <img class="formulaInl" alt="$ p $" src="form_111.png"/>-value for coefficient <img class="formulaInl" alt="$ i $" src="form_33.png"/> is therefore </p><p class="formulaDsp">
+<img class="formulaDsp" alt="\[ p_i = \Pr(|Z| \geq |z_i|) = 2 \cdot (1 - F( |z_i| )) \]" src="form_114.png"/>
+</p>
+<p> where <img class="formulaInl" alt="$ Z $" src="form_115.png"/> is a standard normally distributed random variable.</p>
+<p>The condition number is computed as <img class="formulaInl" alt="$ \kappa(H) $" src="form_394.png"/> during the iteration immediately <em>preceding</em> convergence (i.e., <img class="formulaInl" alt="$ A $" src="form_14.png"/> is computed using the coefficients of the previous iteration). A large condition number (say, more than 1000) indicates the presence of significant multicollinearity.</p>
+<p><a class="anchor" id="Literature"></a></p><dl class="section user"><dt>Literature</dt><dd></dd></dl>
+<p>A somewhat random selection of nice write-ups, with valuable pointers into further literature:</p>
+<p>[1] John Fox: Cox Proportional-Hazards Regression for Survival Data, Appendix to An R and S-PLUS companion to Applied Regression Feb 2012, <a href="http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf">http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf</a></p>
+<p>[2] Stephen J Walters: What is a Cox model? <a href="http://www.medicine.ox.ac.uk/bandolier/painres/download/whatis/cox_model.pdf">http://www.medicine.ox.ac.uk/bandolier/painres/download/whatis/cox_model.pdf</a></p>
+<p><a class="anchor" id="notes"></a></p><dl class="section user"><dt>Notes</dt><dd></dd></dl>
+<p>If number of ties in the source table is very large, a memory allocation error may be raised. The limitation is about <img class="formulaInl" alt="$(10^8 / m)$" src="form_395.png"/>, where <img class="formulaInl" alt="$m$" src="form_314.png"/> is number of featrues. For instance, if there are 100 featrues, the number of ties should be fewer than 1 million.</p>
+<p><a class="anchor" id="related"></a></p><dl class="section user"><dt>Related Topics</dt><dd></dd></dl>
+<p>File <a class="el" href="cox__prop__hazards_8sql__in.html" title="SQL functions for cox proportional hazards. ">cox_prop_hazards.sql_in</a> documenting the functions</p>
+</div><!-- contents -->
+</div><!-- doc-content -->
+<!-- start footer part -->
+<div id="nav-path" class="navpath"><!-- id is needed for treeview function! -->
+  <ul>
+    <li class="footer">Generated on Tue May 16 2017 13:24:38 for MADlib by
+    <a href="http://www.doxygen.org/index.html">
+    <img class="footer" src="doxygen.png" alt="doxygen"/></a> 1.8.13 </li>
+  </ul>
+</div>
+</body>
+</html>