You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mengxr <gi...@git.apache.org> on 2015/02/19 18:47:15 UTC

[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

GitHub user mengxr opened a pull request:

    https://github.com/apache/spark/pull/4695

    [SPARK-5900][MLLIB] make PIC and FPGrowth Java-friendly

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mengxr/spark SPARK-5900

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4695.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4695
    
----
commit 1b9db3d19e941600fca636540eb467ae2ffe29f1
Author: Xiangrui Meng <me...@databricks.com>
Date:   2015-02-19T17:46:30Z

    make PIC and FPGrowth Java-friendly

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75125808
  
    LGTM except for that one comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75125659
  
    Yeah, after reading the code, I like the special classes since the field names make the code more legible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75134621
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27731/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75167597
  
      [Test build #27758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27758/consoleFull) for   PR 4695 at commit [`865b5ca`](https://github.com/apache/spark/commit/865b5ca1134eb09233d27617e970aa3c60c83e95).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75134609
  
      [Test build #27731 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27731/consoleFull) for   PR 4695 at commit [`9c0e590`](https://github.com/apache/spark/commit/9c0e59029eb61690974dbf8c5eecf80270bb6c6d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class Assignment(val id: Long, val cluster: Int)`
      * `class FPGrowthModel[Item: ClassTag](val freqItemsets: RDD[FreqItemset[Item]]) extends Serializable`
      * `  class FreqItemset[Item](val items: Array[Item], val freq: Long) extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75158360
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27742/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75175898
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27758/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75175889
  
      [Test build #27758 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27758/consoleFull) for   PR 4695 at commit [`865b5ca`](https://github.com/apache/spark/commit/865b5ca1134eb09233d27617e970aa3c60c83e95).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class Assignment(val id: Long, val cluster: Int) extends Serializable`
      * `class FPGrowthModel[Item: ClassTag](val freqItemsets: RDD[FreqItemset[Item]]) extends Serializable`
      * `  class FreqItemset[Item](val items: Array[Item], val freq: Long) extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75178074
  
    Merged into master and branch-1.3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75123777
  
    If we return a `JavaPairRDD`, the user code looks like the following:
    
    ~~~
    for (Tuple2<Long, Int> assignment: assignments.collect()) {
      ... assignment._1() ...
      ... assignment._2() ...
    }
    ~~~
    
    With the current setting, this is 
    
    ~~~
    for (Assignment assignment: assignments.toJavaRDD().collect()) {
      ... assignment.id() ...
      ... assignment.cluster() ...
    }
    ~~~
    
    The latter is more readable to me. There is a cost on the user side if we force using a special class in the input, for example, `Rating` for `ALS` and `Document` for `LDA`. But for return types, the cost is not that high. Well, this is not a strong argument.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75158350
  
      [Test build #27742 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27742/consoleFull) for   PR 4695 at commit [`cffa96e`](https://github.com/apache/spark/commit/cffa96e1d0447d1d083fa08fc43ace141c8921f4).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class Assignment(val id: Long, val cluster: Int)`
      * `class FPGrowthModel[Item: ClassTag](val freqItemsets: RDD[FreqItemset[Item]]) extends Serializable`
      * `  class FreqItemset[Item](val items: Array[Item], val freq: Long) extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75126769
  
      [Test build #27731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27731/consoleFull) for   PR 4695 at commit [`9c0e590`](https://github.com/apache/spark/commit/9c0e59029eb61690974dbf8c5eecf80270bb6c6d).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75167973
  
    I missed those issues before!  LGTM pending tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4695#discussion_r25020128
  
    --- Diff: docs/mllib-frequent-pattern-mining.md ---
    @@ -74,11 +74,12 @@ Calling `FPGrowth.run` with transactions returns an
     that stores the frequent itemsets with their frequencies.
     
     {% highlight java %}
    -import java.util.Arrays;
     import java.util.List;
     
     import scala.Tuple2;
    --- End diff --
    
    no longer needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4695


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75101083
  
      [Test build #27726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27726/consoleFull) for   PR 4695 at commit [`1b9db3d`](https://github.com/apache/spark/commit/1b9db3d19e941600fca636540eb467ae2ffe29f1).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75121366
  
    For PIC, the names Assignment / id / cluster sound good to me.  Those would be applicable to other clustering methods if ever needed.  "Assignment" is a little generic, but "ClusteringAssignment" seems too verbose to me.
    * Just wondering, why go for a new type rather than returning a JavaPairRDD via javaAssignments()?  (This seems analogous to the choice in LDA of whether to provide a Document type or take a JavaPairRDD.)
    
    The FPGrowth names and setup sound good to me.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75108954
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27726/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75108945
  
      [Test build #27726 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27726/consoleFull) for   PR 4695 at commit [`1b9db3d`](https://github.com/apache/spark/commit/1b9db3d19e941600fca636540eb467ae2ffe29f1).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public class JavaFPGrowthExample `
      * `public class JavaPowerIterationClusteringExample `
      * `  class Assignment(val id: Long, val cluster: Int)`
      * `class FPGrowthModel[Item: ClassTag](val freqItemsets: RDD[FreqItemset[Item]]) extends Serializable`
      * `  class FreqItemset[Item](val items: Array[Item], val freq: Long) extends Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5900][MLLIB] make PIC and FPGrowth Java...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4695#issuecomment-75148777
  
      [Test build #27742 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27742/consoleFull) for   PR 4695 at commit [`cffa96e`](https://github.com/apache/spark/commit/cffa96e1d0447d1d083fa08fc43ace141c8921f4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org