You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2018/12/11 18:58:00 UTC

[jira] [Created] (MADLIB-1288) Set max itemset size to 10 by default in assoc rules

Frank McQuillan created MADLIB-1288:
---------------------------------------

             Summary: Set max itemset size to 10 by default in assoc rules
                 Key: MADLIB-1288
                 URL: https://issues.apache.org/jira/browse/MADLIB-1288
             Project: Apache MADlib
          Issue Type: Improvement
          Components: Module: Association Rules
            Reporter: Frank McQuillan
             Fix For: v2.0


Story

As a data scientist,
I want to default itemset size to 10,
so that assoc rules does not run for a long time.

Details

We have had some complaints about how long assoc rules runs which could have to do with the implementation, or wrong parameter settings by the user, but may also be due to combinatorial explosion of number of generated rules.  

The R param `maxlen` is default to 10
https://cran.r-project.org/web/packages/arules/arules.pdf
see page 10 "apriori - mining associations with apriori"
which is the same as the madlib param `max_itemset_size`
http://madlib.apache.org/docs/latest/group__grp__assoc__rules.html

"If the minimum support is chosen too low for the dataset,
then the algorithm will try to create an extremely large set of itemsets/rules. This will result in
very long run time and eventually the process will run out of memory. To prevent this, the default
maximal length of itemsets/rules is restricted to 10 items (via the parameter element `maxlen=10`)..."

Acceptance

1) Set `max_itemset_size` parameter to 100 and run a data set that creates rules with more than 10 items.
2) Set `max_itemset_size` to `NULL` and re-run, confirm that default max rule size limit is respected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)