You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by hhbyyh <gi...@git.apache.org> on 2016/06/14 07:23:55 UTC

[GitHub] spark pull request #13656: [SPARK-15938]Adding "support" property to MLlib A...

GitHub user hhbyyh opened a pull request:

    https://github.com/apache/spark/pull/13656

    [SPARK-15938]Adding "support" property to MLlib Association Rule

    ## What changes were proposed in this pull request?
    jira: https://issues.apache.org/jira/browse/SPARK-15938
    
    Support is an indication of how frequently the item-set appears in the database. Besides confidence, "Support" is another critical property for Association rule. 
    References: 
    https://en.wikipedia.org/wiki/Association_rule_learning
    http://www.philippe-fournier-viger.com/spmf/index.php?link=documentation.php#allassociationrules
    https://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf
    Support can be either the count of appearances or the fraction within the dataset. I choose to use the count as:
    1. API compatibility: Currently both FPGrowthModel and Association Rule does not have the information about size of the dataset. I'd try to avoid breaking a list of public APIs.
    2. This also refers to the API of SPMF. http://www.philippe-fournier-viger.com/spmf/index.php?link=documentation.php#allassociationrules.
    In the next steps, we could add constraint like minSupport as in other libraries. FPGrowthModel should also contains the size of the dataset.
    
    
    ## How was this patch tested?
    existing ut. 
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hhbyyh/spark supportAsso

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13656.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13656
    
----
commit 60efd0520a3af52995c2d6b1a2abaeebe658bb32
Author: Yuhao Yang <yu...@intel.com>
Date:   2016-06-14T06:27:21Z

    add support for association rule

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    Hm, I suppose the problem is that you're returning a count here though. See also SPARK-15930 which is related, and concerns tracking the total size of the input. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13656: [SPARK-15938]Adding "support" property to MLlib A...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13656#discussion_r66920272
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/AssociationRules.scala ---
    @@ -120,6 +120,13 @@ object AssociationRules {
         @Since("1.5.0")
         def confidence: Double = freqUnion.toDouble / freqAntecedent
     
    +    /**
    +     * Returns the support of the rule. Current implementation would return the number of
    +     * co-occurrence of antecedent and consequent.
    +     */
    +    @Since("2.1.0")
    +    def support: Double = freqUnion.toDouble
    +
    --- End diff --
    
    This is intentionally typed as Double. In the future, it could be fraction value ( < 1.0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13656: [SPARK-15938]Adding "support" property to MLlib A...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13656#discussion_r66928786
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/AssociationRules.scala ---
    @@ -120,6 +120,13 @@ object AssociationRules {
         @Since("1.5.0")
         def confidence: Double = freqUnion.toDouble / freqAntecedent
     
    +    /**
    +     * Returns the support of the rule. Current implementation would return the number of
    +     * co-occurrence of antecedent and consequent.
    +     */
    +    @Since("2.1.0")
    +    def support: Double = freqUnion.toDouble
    +
    --- End diff --
    
    Dunno, that seems like a mistake to me. It should be a `Long` if it's a count, and should expose alternative factory methods to accept input of different types if needed. Overloading one argument seems like a hack and I'd prefer not to extend it (or fix it).
    
    See SPARK-15930 which concerns adding the input size just for this reason, I assume. We haven't released 2.0, and so could in theory still put in a change to the constructor. I agree, we might however have to deprecate the existing one, add a new one, and still deal with calls to the old constructor, which would mean it's not possible to compute values that are a fraction of the whole data set. This in turn may argue for clearly separating inputs/outputs that are counts vs percentages.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13656: [SPARK-15938]Adding "support" property to MLlib A...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13656#discussion_r68091611
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/AssociationRules.scala ---
    @@ -120,6 +120,13 @@ object AssociationRules {
         @Since("1.5.0")
         def confidence: Double = freqUnion.toDouble / freqAntecedent
     
    +    /**
    +     * Returns the support of the rule. Current implementation would return the number of
    +     * co-occurrence of antecedent and consequent.
    +     */
    +    @Since("2.1.0")
    +    def support: Double = freqUnion.toDouble
    +
    --- End diff --
    
    `support` should be a fraction to be consistent with the semantic of `minSupport` in FPGrowth and PrefixSpan. There should be a compatible way to add `support`. `Rule` is not a case class and its constructor is package private. So this should be easy to add. Another approach is to add total number of records in the model, so people can calculate the support easily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    **[Test build #61130 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61130/consoleFull)** for PR 13656 at commit [`8b16676`](https://github.com/apache/spark/commit/8b166761024c1b5bed9f90aa8f550eb2103b9b64).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13656: [SPARK-15938]Adding "support" property to MLlib A...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13656#discussion_r66924843
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/AssociationRules.scala ---
    @@ -120,6 +120,13 @@ object AssociationRules {
         @Since("1.5.0")
         def confidence: Double = freqUnion.toDouble / freqAntecedent
     
    +    /**
    +     * Returns the support of the rule. Current implementation would return the number of
    +     * co-occurrence of antecedent and consequent.
    +     */
    +    @Since("2.1.0")
    +    def support: Double = freqUnion.toDouble
    +
    --- End diff --
    
    Ah right, we use support as a fraction. Well, then best to be consistent and return it as a fraction of the data set size. I can't imagine having a method sometimes return a value with one type of semantics and sometimes another. Just make two methods.
    
    freqUnion however appears to be a count only, and is even explicitly called a 'frequency'. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    Yes, I've linked the two issues and provided some illustration about the fraction/count choice in the description. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    **[Test build #61131 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61131/consoleFull)** for PR 13656 at commit [`ed384c7`](https://github.com/apache/spark/commit/ed384c7f81c65725a64180b0e7da5267d5173913).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    I made a quick change to demo what's it like if we pass the data size along FPGrowthModel and AssociationRules.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    @srowen I'm also working on the ml.fpm, in which it's easier to include more information in the model and rules. I would suggest:
    1. Use support as count, and avoid any API break;
    2. Let's just keep mllib.fpm as current, I'll close the PR.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13656: [SPARK-15938]Adding "support" property to MLlib A...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13656#discussion_r66921083
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/AssociationRules.scala ---
    @@ -120,6 +120,13 @@ object AssociationRules {
         @Since("1.5.0")
         def confidence: Double = freqUnion.toDouble / freqAntecedent
     
    +    /**
    +     * Returns the support of the rule. Current implementation would return the number of
    +     * co-occurrence of antecedent and consequent.
    +     */
    +    @Since("2.1.0")
    +    def support: Double = freqUnion.toDouble
    +
    --- End diff --
    
    I don't think the meaning of this should ever be overloaded. Support is a count.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    I'm not sure if this is something that would still be considered since we aren't doing new development for MLlib anymore. It might make more sense to work on https://issues.apache.org/jira/browse/SPARK-14503 and then implement this after.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    **[Test build #60475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60475/consoleFull)** for PR 13656 at commit [`60efd05`](https://github.com/apache/spark/commit/60efd0520a3af52995c2d6b1a2abaeebe658bb32).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13656: [SPARK-15938]Adding "support" property to MLlib A...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13656#discussion_r66927506
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/AssociationRules.scala ---
    @@ -120,6 +120,13 @@ object AssociationRules {
         @Since("1.5.0")
         def confidence: Double = freqUnion.toDouble / freqAntecedent
     
    +    /**
    +     * Returns the support of the rule. Current implementation would return the number of
    +     * co-occurrence of antecedent and consequent.
    +     */
    +    @Since("2.1.0")
    +    def support: Double = freqUnion.toDouble
    +
    --- End diff --
    
    I suppose `freqUnion` is made as a Double on purpose for the same reason. (flexibility for the future)
    
    Making support a fraction now requires that we must keep the dataset size info in `FPGrowthModel` and `AssociationRule`. Yet that would introduce API change. I thought we should avoid breaking API between 2.0 and 2.1.
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13656: [SPARK-15938]Adding "support" property to MLlib A...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13656#discussion_r66931502
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/AssociationRules.scala ---
    @@ -120,6 +120,13 @@ object AssociationRules {
         @Since("1.5.0")
         def confidence: Double = freqUnion.toDouble / freqAntecedent
     
    +    /**
    +     * Returns the support of the rule. Current implementation would return the number of
    +     * co-occurrence of antecedent and consequent.
    +     */
    +    @Since("2.1.0")
    +    def support: Double = freqUnion.toDouble
    +
    --- End diff --
    
    I find it hard to just deprecating the old constructor and still keep `support` as a fraction if no dataset size is passed in. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13656: [SPARK-15938]Adding "support" property to MLlib A...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13656#discussion_r66931976
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/AssociationRules.scala ---
    @@ -120,6 +120,13 @@ object AssociationRules {
         @Since("1.5.0")
         def confidence: Double = freqUnion.toDouble / freqAntecedent
     
    +    /**
    +     * Returns the support of the rule. Current implementation would return the number of
    +     * co-occurrence of antecedent and consequent.
    +     */
    +    @Since("2.1.0")
    +    def support: Double = freqUnion.toDouble
    +
    --- End diff --
    
    Yes, it's not possible to implement in that case. There's an argument for just adding the parameter and removing the old constructor for 2.0 in order to support this without the convolutions. I'd love to get a thumbs up from @jkbradley or @mengxr though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    **[Test build #61130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61130/consoleFull)** for PR 13656 at commit [`8b16676`](https://github.com/apache/spark/commit/8b166761024c1b5bed9f90aa8f550eb2103b9b64).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    **[Test build #61131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61131/consoleFull)** for PR 13656 at commit [`ed384c7`](https://github.com/apache/spark/commit/ed384c7f81c65725a64180b0e7da5267d5173913).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    **[Test build #60475 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60475/consoleFull)** for PR 13656 at commit [`60efd05`](https://github.com/apache/spark/commit/60efd0520a3af52995c2d6b1a2abaeebe658bb32).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    Close this and add the support to ml.fpm.  https://github.com/apache/spark/pull/17280 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61130/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60475/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13656: [SPARK-15938]Adding "support" property to MLlib A...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh closed the pull request at:

    https://github.com/apache/spark/pull/13656


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61131/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13656: [SPARK-15938]Adding "support" property to MLlib Associat...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/13656
  
    Thanks for the review @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13656: [SPARK-15938]Adding "support" property to MLlib A...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13656#discussion_r66924013
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/AssociationRules.scala ---
    @@ -120,6 +120,13 @@ object AssociationRules {
         @Since("1.5.0")
         def confidence: Double = freqUnion.toDouble / freqAntecedent
     
    +    /**
    +     * Returns the support of the rule. Current implementation would return the number of
    +     * co-occurrence of antecedent and consequent.
    +     */
    +    @Since("2.1.0")
    +    def support: Double = freqUnion.toDouble
    +
    --- End diff --
    
    Two major consideration:
    1. In most definition and text books, `support` is a fraction value in [0.0, 1.0]. It's possible for us to align with it in the future.
     
    2. Current implementation of Association rule actually allows both 
          `freqUnion: Double,` and
          `freqAntecedent: Double`
        to be fraction value [0.0, 1.0] although they are both counts now. I don't want to destroy the flexibility and break API in the future.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org