You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@madlib.apache.org by ri...@apache.org on 2017/11/23 00:01:34 UTC

madlib git commit: Docs: Update docs for HITS and linear regr

Repository: madlib
Updated Branches:
  refs/heads/master c2a874db7 -> daf67f81b


Docs: Update docs for HITS and linear regr


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/daf67f81
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/daf67f81
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/daf67f81

Branch: refs/heads/master
Commit: daf67f81b608396d8e3c04a9bf9890449a0a5b3c
Parents: c2a874d
Author: Frank McQuillan <fm...@pivotal.io>
Authored: Wed Nov 22 16:00:37 2017 -0800
Committer: Rahul Iyer <ri...@apache.org>
Committed: Wed Nov 22 16:00:37 2017 -0800

----------------------------------------------------------------------
 src/ports/postgres/modules/graph/hits.sql_in    | 29 ++++++++++----------
 .../postgres/modules/regress/linear.sql_in      | 11 ++++----
 2 files changed, 20 insertions(+), 20 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/madlib/blob/daf67f81/src/ports/postgres/modules/graph/hits.sql_in
----------------------------------------------------------------------
diff --git a/src/ports/postgres/modules/graph/hits.sql_in b/src/ports/postgres/modules/graph/hits.sql_in
index d06bbb8..bf4b414 100644
--- a/src/ports/postgres/modules/graph/hits.sql_in
+++ b/src/ports/postgres/modules/graph/hits.sql_in
@@ -46,7 +46,7 @@ graph.
 Given a graph, the HITS (Hyperlink-Induced Topic Search) algorithm outputs the 
 authority score and hub score of every vertex, where authority estimates the
 value of the content of the page and hub estimates the value of its links to
-other pages. This algorithm was developed by Jon Kleinberg to rate web pages.
+other pages. This algorithm was originally developed to rate web pages [1].
 
 @anchor hits
 @par HITS
@@ -91,10 +91,9 @@ this string argument:
     a row for every vertex from 'vertex_table' with the following columns:
     - vertex_id : The id of a vertex. Will use the input parameter 'vertex_id' 
                   for column naming.
-    - authority : The vertex's authority score.
-    - hub : The vertex's hub score.
-    - grouping_cols : Grouping column (if any) values associated with the vertex_id.
-
+    - authority : The vertex authority score.
+    - hub : The vertex hub score.
+    - grouping_cols : Grouping column values (if any) associated with the vertex_id.
 </dd>
 
 A summary table is also created that contains information 
@@ -103,18 +102,19 @@ It is named by adding the suffix '_summary' to the 'out_table'
 parameter.
 
 <dt>max_iter (optional) </dt>
-<dd>INTEGER, default: 100. The maximum number of iterations allowed. An 
+<dd>INTEGER, default: 100. The maximum number of iterations allowed. Each 
     iteration consists of both authority and hub phases.</dd>
 
 <dt>threshold (optional) </dt>
 <dd>FLOAT8, default: (1/number of vertices * 1000).
+    Threshold must be set to a value between 0 and 1, inclusive
+    of end points. 
     If the difference between two consecutive iterations of authority AND two
     consecutive iterations of hub is smaller than 'threshold', then the
-    computation stops. If you set the threshold to zero, then you will force the
+    computation stops. That is, both authority and hub value differences 
+    must be below the specified threshold for the algorithm to stop.
+    If you set the threshold to 0, then you will force the
     algorithm to run for the full number of iterations specified in 'max_iter'.
-    Threshold needs to be set to a value between 0 and 1. Note that both
-    authority and hub value difference must be below threshold for the
-    algorithm to stop.
 </dd>
 
 <dt>grouping_cols (optional)</dt>
@@ -130,9 +130,8 @@ parameter.
 @anchor notes
 @par Notes
 
-1. The HITS algorithm is based on Kleinberg's paper [1].
-2. This algorithm supports multigraph and each duplicated edge is considered
-   for counting when calculating authority and hub scores.
+This algorithm supports multigraph and each duplicated edge is considered
+for counting when calculating authority and hub scores.
 
 @anchor examples
 @examp
@@ -370,7 +369,9 @@ SELECT * FROM hits_out_summary order by user_id;
 @anchor literature
 @par Literature
 
-[1] HITS algorithm https://www.cs.cornell.edu/home/kleinber/auth.pdf
+[1] Kleinerg, Jon M., "Authoritative Sources in a Hyperlinked 
+Environment", Journal of the ACM, Sept. 1999.  
+https://www.cs.cornell.edu/home/kleinber/auth.pdf
 */
 
 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.hits(

http://git-wip-us.apache.org/repos/asf/madlib/blob/daf67f81/src/ports/postgres/modules/regress/linear.sql_in
----------------------------------------------------------------------
diff --git a/src/ports/postgres/modules/regress/linear.sql_in b/src/ports/postgres/modules/regress/linear.sql_in
index 6572652..e1484db 100644
--- a/src/ports/postgres/modules/regress/linear.sql_in
+++ b/src/ports/postgres/modules/regress/linear.sql_in
@@ -189,7 +189,6 @@ linregr_predict(coef, col_ind)
 <dl class="arglist">
 <dt>coef</dt>
 <dd>FLOAT8[]. Vector of the coefficients of regression from training.</dd>
-
 <dt>col_ind</dt>
 <dd>FLOAT8[]. An array containing the independent variable column names,
 as was used for the training. </dd>
@@ -326,14 +325,14 @@ variance_covariance      | {{1226330302.62852,-300921.595596804,551696673.397849
 <pre class="example">
 \\x OFF
 SELECT houses.*,
-       madlib.linregr_predict( ARRAY[1,tax,bath,size],
-                               m.coef
+       madlib.linregr_predict( m.coef,
+                               ARRAY[1,tax,bath,size]
                              ) as predict,
         price -
-          madlib.linregr_predict( ARRAY[1,tax,bath,size],
-                                  m.coef
+          madlib.linregr_predict( m.coef,
+                                  ARRAY[1,tax,bath,size]
                                 ) as residual
-FROM houses, houses_linregr m;
+FROM houses, houses_linregr m ORDER BY id;
 </pre>
 Result:
 <pre class="result">