You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2019/04/09 23:42:00 UTC

[jira] [Updated] (MADLIB-1288) Set max itemset size to 10 by default in assoc rules

     [ https://issues.apache.org/jira/browse/MADLIB-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Frank McQuillan updated MADLIB-1288:
------------------------------------
    Priority: Minor  (was: Major)

> Set max itemset size to 10 by default in assoc rules
> ----------------------------------------------------
>
>                 Key: MADLIB-1288
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1288
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Association Rules
>            Reporter: Frank McQuillan
>            Priority: Minor
>             Fix For: v1.16
>
>
> Story
> As a data scientist,
> I want to default itemset size to 10,
> so that assoc rules does not run for a long time.
> Details
> We have had some complaints about how long assoc rules runs which could have to do with the implementation, or wrong parameter settings by the user, but may also be due to combinatorial explosion of number of generated rules.  
> The R param `maxlen` is default to 10
> https://cran.r-project.org/web/packages/arules/arules.pdf
> see page 10 "apriori - mining associations with apriori"
> which is the same as the madlib param `max_itemset_size`
> http://madlib.apache.org/docs/latest/group__grp__assoc__rules.html
> "If the minimum support is chosen too low for the dataset,
> then the algorithm will try to create an extremely large set of itemsets/rules. This will result in
> very long run time and eventually the process will run out of memory. To prevent this, the default
> maximal length of itemsets/rules is restricted to 10 items (via the parameter element `maxlen=10`)..."
> Interface
> Stays the same.  The allowed values for max_itemset_size are:
> * any number 2 or more
> * if not specified set to 10 (default)
> * can also accept `ALL` as in input which means generate itemsets of all sizes - this is the current behavior today in 1.15.1
> Acceptance
> 1) Set `max_itemset_size` parameter to 100 and run a data set that creates rules with more than 10 items.
> 2) Set `max_itemset_size` to `NULL` and re-run, confirm that default max rule size limit is respected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)