You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Frank McQuillan (JIRA)" <ji...@apache.org> on 2019/04/25 19:15:01 UTC

[jira] [Resolved] (MADLIB-1288) Set max itemset size to 10 by default in assoc rules

     [ https://issues.apache.org/jira/browse/MADLIB-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Frank McQuillan resolved MADLIB-1288.
-------------------------------------
    Resolution: Fixed

LGTM , see PR for tests

> Set max itemset size to 10 by default in assoc rules
> ----------------------------------------------------
>
>                 Key: MADLIB-1288
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1288
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Association Rules
>            Reporter: Frank McQuillan
>            Priority: Minor
>             Fix For: v1.16
>
>
> Story
> As a data scientist,
> I want to default itemset size to 10,
> so that assoc rules does not run for a long time.
> Details
> We have had some complaints about how long assoc rules runs which could have to do with the implementation, or wrong parameter settings by the user, but may also be due to combinatorial explosion of number of generated rules.  
> The R param `maxlen` is default to 10
> https://cran.r-project.org/web/packages/arules/arules.pdf
> see page 10 "apriori - mining associations with apriori"
> which is the same as the madlib param `max_itemset_size`
> http://madlib.apache.org/docs/latest/group__grp__assoc__rules.html
> "If the minimum support is chosen too low for the dataset,
> then the algorithm will try to create an extremely large set of itemsets/rules. This will result in
> very long run time and eventually the process will run out of memory. To prevent this, the default
> maximal length of itemsets/rules is restricted to 10 items (via the parameter element `maxlen=10`)..."
> Interface
> Stays the same.  The allowed values for max_itemset_size are:
> * any number 2 or more
> * if not specified set to 10 (default)
> * if user wants all itemsets they can specify a big number like 1000 or 10000 or whatever
> Acceptance
> 1) Set `max_itemset_size` parameter to 100 and run a data set that creates rules with more than 10 items.
> 2) Set `max_itemset_size` to `NULL` and re-run, confirm that default max rule size limit of 10 is respected.
> 3) Set `max_itemset_size` parameter to 10 and check it creates the same rules as #2 above.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)