You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by caiquermarques95 <ca...@gmail.com> on 2015/12/02 13:51:05 UTC

Python API for Association Rules

Hello everyone!
I'm developing to the Python API for association rules (
https://issues.apache.org/jira/browse/SPARK-8855), but I found a doubt.

Following the description of the issue, it says that a important method is "
*FPGrowthModel.generateAssociationRules()*", of course. However, is not
clear if a wrapper for the association rules it will be in "
*FPGrowthModelWrapper.scala*" and this is the problem.

My idea is the following:
1) In the fpm.py file; class "Association Rules" with one method and a
class:
1.1) Method train(data, minConfidence), that will generate the association
rules for a data with a minConfidence specified (0.6 default). This method
will call the "trainAssociationRules" from the *PythonMLLibAPI* with the
parameters data and minConfidence. Later. will return a FPGrowthModel.
1.2) Class Rule, that will a namedtuple, represents an (antecedent,
consequent) tuple.

2) Still in fpm.py, in the class FPGrowthModel, a new method will be added,
called generateAssociationRules, that will map the Rules generated calling
the method "getAssociationRule" from FPGrowthModelWrapper to the namedtuple.

Now is my doubt, how to make trainAssociationRules returns a FGrowthModel
to the Wrapper just maps the rule received to the antecedent/consequent? I
could not do the method trainAssociationRules returns a FPGrowthModel. The
wrapper for association rules is in FPGrowthModelWrapper, right?

For illustration, I think something like this in *PythonMLLibAPI:*

def trainAssociationRules(
      data: JavaRDD[FPGrowth.FreqItemset[Any]],
      minConfidence: Double): [return type] = {

    val model = new FPGrowthModel(data.rdd)
      .generateAssociationRules(minConfidence)

    new FPGrowthModelWrapper(model)
  }

And in FPGrowthModelWrapper, something like:

 def getAssociationRules: [return type] = {
    SerDe.fromTuple2RDD(rule.map(x => (x.javaAntecedent, x.javaConsequent)))
 }

I know that will fail, but, what is wrong with my idea?
Any suggestions?

Thanks for the help and the tips.
Caique.




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Python-API-for-Association-Rules-tp15419.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Python API for Association Rules

Posted by Caique Marques <ca...@gmail.com>.
Hi Joseph.
Sorry for my fail, I will comment on Jira.

Thanks.
Caique.

2015-12-02 19:12 GMT-02:00 Joseph Bradley <jo...@databricks.com>:

> If you're working on a feature, please comment on the JIRA first (to avoid
> conflicts / duplicate work).  Could you please copy what your wrote to the
> JIRA to discuss there?
> Thanks,
> Joseph
>
> On Wed, Dec 2, 2015 at 4:51 AM, caiquermarques95 <
> caiquermarques95@gmail.com> wrote:
>
>> Hello everyone!
>> I'm developing to the Python API for association rules (
>> https://issues.apache.org/jira/browse/SPARK-8855), but I found a doubt.
>>
>> Following the description of the issue, it says that a important method
>> is "*FPGrowthModel.generateAssociationRules()*", of course. However, is
>> not clear if a wrapper for the association rules it will be in "
>> *FPGrowthModelWrapper.scala*" and this is the problem.
>>
>> My idea is the following:
>> 1) In the fpm.py file; class "Association Rules" with one method and a
>> class:
>> 1.1) Method train(data, minConfidence), that will generate the
>> association rules for a data with a minConfidence specified (0.6 default).
>> This method will call the "trainAssociationRules" from the
>> *PythonMLLibAPI* with the parameters data and minConfidence. Later. will
>> return a FPGrowthModel.
>> 1.2) Class Rule, that will a namedtuple, represents an (antecedent,
>> consequent) tuple.
>>
>> 2) Still in fpm.py, in the class FPGrowthModel, a new method will be
>> added, called generateAssociationRules, that will map the Rules generated
>> calling the method "getAssociationRule" from FPGrowthModelWrapper to the
>> namedtuple.
>>
>> Now is my doubt, how to make trainAssociationRules returns a FGrowthModel
>> to the Wrapper just maps the rule received to the antecedent/consequent? I
>> could not do the method trainAssociationRules returns a FPGrowthModel. The
>> wrapper for association rules is in FPGrowthModelWrapper, right?
>>
>> For illustration, I think something like this in *PythonMLLibAPI:*
>>
>> def trainAssociationRules(
>>       data: JavaRDD[FPGrowth.FreqItemset[Any]],
>>       minConfidence: Double): [return type] = {
>>
>>     val model = new FPGrowthModel(data.rdd)
>>       .generateAssociationRules(minConfidence)
>>
>>     new FPGrowthModelWrapper(model)
>>   }
>>
>> And in FPGrowthModelWrapper, something like:
>>
>>  def getAssociationRules: [return type] = {
>>     SerDe.fromTuple2RDD(rule.map(x => (x.javaAntecedent,
>> x.javaConsequent)))
>>  }
>>
>> I know that will fail, but, what is wrong with my idea?
>> Any suggestions?
>>
>> Thanks for the help and the tips.
>> Caique.
>>
>> ------------------------------
>> View this message in context: Python API for Association Rules
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/Python-API-for-Association-Rules-tp15419.html>
>> Sent from the Apache Spark Developers List mailing list archive
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
>> Nabble.com.
>>
>
>

Re: Python API for Association Rules

Posted by Joseph Bradley <jo...@databricks.com>.
If you're working on a feature, please comment on the JIRA first (to avoid
conflicts / duplicate work).  Could you please copy what your wrote to the
JIRA to discuss there?
Thanks,
Joseph

On Wed, Dec 2, 2015 at 4:51 AM, caiquermarques95 <caiquermarques95@gmail.com
> wrote:

> Hello everyone!
> I'm developing to the Python API for association rules (
> https://issues.apache.org/jira/browse/SPARK-8855), but I found a doubt.
>
> Following the description of the issue, it says that a important method is
> "*FPGrowthModel.generateAssociationRules()*", of course. However, is not
> clear if a wrapper for the association rules it will be in "
> *FPGrowthModelWrapper.scala*" and this is the problem.
>
> My idea is the following:
> 1) In the fpm.py file; class "Association Rules" with one method and a
> class:
> 1.1) Method train(data, minConfidence), that will generate the association
> rules for a data with a minConfidence specified (0.6 default). This method
> will call the "trainAssociationRules" from the *PythonMLLibAPI* with the
> parameters data and minConfidence. Later. will return a FPGrowthModel.
> 1.2) Class Rule, that will a namedtuple, represents an (antecedent,
> consequent) tuple.
>
> 2) Still in fpm.py, in the class FPGrowthModel, a new method will be
> added, called generateAssociationRules, that will map the Rules generated
> calling the method "getAssociationRule" from FPGrowthModelWrapper to the
> namedtuple.
>
> Now is my doubt, how to make trainAssociationRules returns a FGrowthModel
> to the Wrapper just maps the rule received to the antecedent/consequent? I
> could not do the method trainAssociationRules returns a FPGrowthModel. The
> wrapper for association rules is in FPGrowthModelWrapper, right?
>
> For illustration, I think something like this in *PythonMLLibAPI:*
>
> def trainAssociationRules(
>       data: JavaRDD[FPGrowth.FreqItemset[Any]],
>       minConfidence: Double): [return type] = {
>
>     val model = new FPGrowthModel(data.rdd)
>       .generateAssociationRules(minConfidence)
>
>     new FPGrowthModelWrapper(model)
>   }
>
> And in FPGrowthModelWrapper, something like:
>
>  def getAssociationRules: [return type] = {
>     SerDe.fromTuple2RDD(rule.map(x => (x.javaAntecedent,
> x.javaConsequent)))
>  }
>
> I know that will fail, but, what is wrong with my idea?
> Any suggestions?
>
> Thanks for the help and the tips.
> Caique.
>
> ------------------------------
> View this message in context: Python API for Association Rules
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Python-API-for-Association-Rules-tp15419.html>
> Sent from the Apache Spark Developers List mailing list archive
> <http://apache-spark-developers-list.1001551.n3.nabble.com/> at
> Nabble.com.
>