You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:03:11 UTC

[jira] [Updated] (SPARK-20324) Control itemSets length in PrefixSpan

     [ https://issues.apache.org/jira/browse/SPARK-20324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-20324:
---------------------------------
    Labels: bulk-closed  (was: )

> Control itemSets length in PrefixSpan
> -------------------------------------
>
>                 Key: SPARK-20324
>                 URL: https://issues.apache.org/jira/browse/SPARK-20324
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 2.1.0
>            Reporter: Cyril de Vogelaere
>            Priority: Minor
>              Labels: bulk-closed
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> The idea behind this improvement would be to allow better control over the size of itemSets in solution patterns.
> For example, assuming you posses a huge dataset of series product bought together, one sequence per client. And you want to find item frequently bough in pairs, as to make interesting promotions to your client or boost certains sales.
> In the current implementation, all solutions would have to be calculated, before the user can sort through them and select only interesting ones.
> What i'm proposing here, is the addition of two parameters : 
> First, a maxItemPerItemset parameter which would limit the maximum number of item per itemset to a certain size X. Allowing potential important reduction in the search space, hastening the process of finding theses specific solutions.
> Second a tandem minItemPerItemset parameter  that would limit the minimum number of item per itemset. Discarding solution that do not fit this constraint. Although this wouldn't entail a reduction of the constraint, this should still allow interested user to reduce the number of solutions collected by the driver.
> If this improvement proposition seems interesting to the community, I will implement a solution along with test to guarantee the correcteness of it's implementation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org