You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@madlib.apache.org by nj...@apache.org on 2019/07/24 22:47:14 UTC

[madlib] 02/02: Assoc Rules: Minor updates to user docs for new params

This is an automated email from the ASF dual-hosted git repository.

njayaram pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git

commit ed202b1020643cde011a0472ebbe84ed7d6b63a0
Author: Frank McQuillan <fm...@pivotal.io>
AuthorDate: Wed Jul 24 13:37:47 2019 -0700

    Assoc Rules: Minor updates to user docs for new params
---
 .../modules/assoc_rules/assoc_rules.sql_in         | 23 +++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in b/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in
index 321b4fa..a7f639b 100644
--- a/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in
+++ b/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in
@@ -163,9 +163,12 @@ meets minimum confidence requirements.
 
 @note Beware of combinatorial explosion.  The Apriori algorithm can potentially
 generate a huge number of rules, even for fairly simple data sets, resulting
-in run-times that are unreasonably long.  To avoid this, it is recommended
+in run times that are unreasonably long.  To avoid this, it is recommended
 to cap the maximum itemset size to a small number to start with, then
-increase it gradually.  <em>Support</em> and <em>confidence</em> values are
+increase it gradually.  Similarly, <em>max_LHS_size</em> and <em>max_RHS_size</em>
+limit the number of items on the LHS and RHS of the rules
+and can significantly reduce run times.
+<em>Support</em> and <em>confidence</em> values are
 parameters that can also be used to control rule generation.
 
 @anchor syntax
@@ -280,17 +283,16 @@ This generates all association rules that satisfy the specified minimum
   <dt>max_LHS_size (optional)</dt>
   <dd>INTEGER, default: NULL. Determines the maximum size of the left hand side
   of the rule. Must be 1 or more.
-  This parameter can be used to reduce run time for data sets where itemset size is large,
-  which is a common situation. If your query is not returning or is running too long,
-  try using a lower value for this parameter.</dd>
+  This parameter can be used to reduce run time.</dd>
 
 
   <dt>max_RHS_size (optional)</dt>
   <dd>INTEGER, default: NULL. Determines the maximum size of the right hand side
   of the rule. Must be 1 or more.
-  This parameter can be used to reduce run time for data sets where itemset size is large,
-  which is a common situation. If your query is not returning or is running too long,
-  try using a lower value for this parameter.</dd>
+  This parameter can be used to reduce run time.  For example, setting to 1
+  can significantly reduce run time if this makes sense for your use case.
+  (The <em>apriori</em> algorithm in the R package <em>arules</em> [2] only
+  supports a RHS of 1.)</dd>
 </dl>
 
 
@@ -462,13 +464,16 @@ Result:
 
 The association rules function always creates a table named \c assoc_rules.
 Make a copy of this table before running the function again if you would
-like to keep multiple association rule tables.
+like to keep multiple association rule tables.  This behavior will be improved
+in a later release.
 
 @anchor literature
 @literature
 
 [1] https://en.wikipedia.org/wiki/Apriori_algorithm
 
+[2] https://cran.r-project.org/web/packages/arules/arules.pdf
+
 @anchor related
 @par Related Topics