You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Caique Rodrigues Marques (JIRA)" <ji...@apache.org> on 2015/12/03 05:07:11 UTC

[jira] [Commented] (SPARK-8855) Python API for Association Rules

    [ https://issues.apache.org/jira/browse/SPARK-8855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15037211#comment-15037211 ] 

Caique Rodrigues Marques commented on SPARK-8855:
-------------------------------------------------

I am working on this, but I found a doubt.

Following the description of the issue, it says that a important method is "FPGrowthModel.generateAssociationRules()", of course. However, is not clear if a wrapper for the association rules it will be in "FPGrowthModelWrapper.scala" and this is the problem.

My idea is the following:
1) In the fpm.py file; class "Association Rules" with one method and a class:
1.1) Method train(data, minConfidence), that will generate the association rules for a data with a minConfidence specified (0.6 default). This method will call the "trainAssociationRules" from the PythonMLLibAPI with the parameters data and minConfidence. Later. will return a FPGrowthModel.
1.2) Class Rule, that will a namedtuple, represents an (antecedent, consequent) tuple.

2) Still in fpm.py, in the class FPGrowthModel, a new method will be added, called generateAssociationRules, that will map the Rules generated calling the method "getAssociationRule" from FPGrowthModelWrapper to the namedtuple.

Now is my doubt, how to make trainAssociationRules returns a FGrowthModel to the Wrapper just maps the rule received to the antecedent/consequent? I could not do the method trainAssociationRules returns a FPGrowthModel. The wrapper for association rules is in FPGrowthModelWrapper, right? Something wrong with the idea?

For illustration, I think something like this in PythonMLLibAPI and in FPGrowthModelWrapper, respectively:
{code:none}
//  PythonMLLibAPI.scala
def trainAssociationRules(
      data: JavaRDD[FPGrowth.FreqItemset[Any]],
      minConfidence: Double): [return type] = {

    val model = new FPGrowthModel(data.rdd)
      .generateAssociationRules(minConfidence)

    new FPGrowthModelWrapper(model) // will fail
  }
-----------------------------------------------------------------------
//  FPGrowthModelWrapper.scala
def getAssociationRules: [return type] = {
    SerDe.fromTuple2RDD(rule.map(x => (x.javaAntecedent, x.javaConsequent)))
 }

{code}

Any suggestions?

> Python API for Association Rules
> --------------------------------
>
>                 Key: SPARK-8855
>                 URL: https://issues.apache.org/jira/browse/SPARK-8855
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Feynman Liang
>            Priority: Minor
>
> A simple Python wrapper and doctests needs to be written for Association Rules. The relevant method is {{FPGrowthModel.generateAssociationRules}}. The code will likely live in {{fpm.py}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org