You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by srowen <gi...@git.apache.org> on 2014/07/31 12:33:26 UTC

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

GitHub user srowen opened a pull request:

    https://github.com/apache/spark/pull/1687

    SPARK-2768 [MLLIB] Add product, user recommend method to MatrixFactorizationModel

    Right now, `MatrixFactorizationModel` can only predict a score for one or more `(user,product)` tuples. As a comment in the file notes, it would be more useful to expose a recommend method, that computes top N scoring products for a user (or vice versa – users for a product).
    
    (This also corrects some long lines in the Java ALS test suite.)
    
    As you can see, it's a little messy to access the class from Java. Should there be a Java-friendly wrapper for it? with a pointer about where that should go, I could add that.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/srowen/spark SPARK-2768

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1687.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1687
    
----
commit 7bc35f9ca6926e968ea9e497f54806eaef4116b8
Author: Sean Owen <sr...@gmail.com>
Date:   2014-07-31T10:31:23Z

    Add recommend methods to MatrixFactorizationModel

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1687#issuecomment-50877033
  
    QA results for PR 1687:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17672/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/1687#issuecomment-50891037
  
    LGTM. Merged into master. Thanks!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1687#discussion_r15645545
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala ---
    @@ -66,6 +66,42 @@ class MatrixFactorizationModel private[mllib] (
       }
     
       /**
    +   * Recommends products to users.
    +   *
    +   * @param user the user to recommend products to
    +   * @param howMany how many products to return. The number returned may be less than this.
    +   * @return product ID and score tuples, sorted descending by score. The first product returned
    +   *  is the one predicted to be most strongly recommended to the user. The score is an opaque
    +   *  value that indicates how strongly recommended the product is.
    +   */
    +  def recommendProducts(user: Int, howMany: Int = 10): Array[(Int,Double)] =
    +    recommend(userFeatures.lookup(user).head, productFeatures, howMany)
    +
    +  /**
    +   * Recommends users to products. That is, this returns users who are most likely to be
    +   * interested in a product.
    +   *
    +   * @param product the product to recommend users to
    +   * @param howMany how many users to return. The number returned may be less than this.
    +   * @return user ID and score tuples, sorted descending by score. The first user returned
    +   *  is the one predicted to be most strongly interested in the product. The score is an opaque
    +   *  value that indicates how strongly interested the user is.
    +   */
    +  def recommendUsers(product: Int, howMany: Int = 10): Array[(Int,Double)] =
    --- End diff --
    
    space after `Int,`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1687#discussion_r15645502
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala ---
    @@ -66,6 +66,42 @@ class MatrixFactorizationModel private[mllib] (
       }
     
       /**
    +   * Recommends products to users.
    +   *
    +   * @param user the user to recommend products to
    +   * @param howMany how many products to return. The number returned may be less than this.
    +   * @return product ID and score tuples, sorted descending by score. The first product returned
    +   *  is the one predicted to be most strongly recommended to the user. The score is an opaque
    +   *  value that indicates how strongly recommended the product is.
    +   */
    +  def recommendProducts(user: Int, howMany: Int = 10): Array[(Int,Double)] =
    +    recommend(userFeatures.lookup(user).head, productFeatures, howMany)
    +
    +  /**
    +   * Recommends users to products. That is, this returns users who are most likely to be
    --- End diff --
    
    ditto: `Recommends users to a product.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1687#discussion_r15645443
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala ---
    @@ -66,6 +66,42 @@ class MatrixFactorizationModel private[mllib] (
       }
     
       /**
    +   * Recommends products to users.
    +   *
    +   * @param user the user to recommend products to
    +   * @param howMany how many products to return. The number returned may be less than this.
    +   * @return product ID and score tuples, sorted descending by score. The first product returned
    +   *  is the one predicted to be most strongly recommended to the user. The score is an opaque
    +   *  value that indicates how strongly recommended the product is.
    +   */
    +  def recommendProducts(user: Int, howMany: Int = 10): Array[(Int,Double)] =
    --- End diff --
    
    The default value is not Java friendly. It should be okay if we don't set a default here.
    
    `howMany` -> `num`? This is similar to `RDD.top`, where we used `num` as the argument name.
    
    Add a space after `Int,`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1687#discussion_r15645648
  
    --- Diff: mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java ---
    @@ -29,6 +29,8 @@
     import org.apache.spark.api.java.JavaSparkContext;
     
     import org.jblas.DoubleMatrix;
    +import scala.Tuple2;
    --- End diff --
    
    scala imports should be before third-party imports


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1687#discussion_r15657246
  
    --- Diff: mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java ---
    @@ -28,6 +28,8 @@
     import org.apache.spark.api.java.JavaRDD;
     import org.apache.spark.api.java.JavaSparkContext;
     
    +import scala.Tuple2;
    +import scala.Tuple3;
    --- End diff --
    
    Sorry if it wasn't right before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1687#discussion_r15645652
  
    --- Diff: mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java ---
    @@ -44,21 +46,27 @@ public void tearDown() {
         sc = null;
       }
     
    -  static void validatePrediction(MatrixFactorizationModel model, int users, int products, int features,
    -      DoubleMatrix trueRatings, double matchThreshold, boolean implicitPrefs, DoubleMatrix truePrefs) {
    +  static void validatePrediction(MatrixFactorizationModel model,
    +                                 int users,
    --- End diff --
    
    4 space indentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1687#discussion_r15657224
  
    --- Diff: mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java ---
    @@ -28,6 +28,8 @@
     import org.apache.spark.api.java.JavaRDD;
     import org.apache.spark.api.java.JavaSparkContext;
     
    +import scala.Tuple2;
    +import scala.Tuple3;
    --- End diff --
    
    The imports should be
    
    1. java imports
    2. scala imports
    3. third-party imports
    4. spark imports
    
    So it should be
    
    ~~~
    import scala.Tuple2;
    import scala.Tuple3;
    
    import org.jblas.DoubleMatrix;
    
    import org.apache.spark.api.java.JavaRDD;
    import org.apache.spark.api.java.JavaSparkContext;
    ~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1687#discussion_r15645856
  
    --- Diff: mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java ---
    @@ -171,4 +180,29 @@ public void runImplicitALSWithNegativeWeight() {
         validatePrediction(model, users, products, features, testData._2(), 0.4, true, testData._3());
       }
     
    +  @Test
    +  public void runRecommend() {
    +    int features = 5;
    +    int iterations = 10;
    +    int users = 200;
    +    int products = 50;
    +    Tuple3<List<Rating>, DoubleMatrix, DoubleMatrix> testData = ALSSuite.generateRatingsAsJavaList(
    +        users, products, features, 0.7, true, false);
    +    JavaRDD<Rating> data = sc.parallelize(testData._1());
    +    MatrixFactorizationModel model = ALS.trainImplicit(data.rdd(), features, iterations);
    +    validateRecommendations(model.recommendProducts(1, 10), 10);
    +    validateRecommendations(model.recommendUsers(1, 20), 20);
    +  }
    +
    +  private static void validateRecommendations(Tuple2<Object,Object>[] recommendations, int howMany) {
    +    @SuppressWarnings("unchecked")
    +    Tuple2<Integer,Double>[] javaRecs = (Tuple2<Integer,Double>[]) (Object[]) recommendations;
    +    Assert.assertEquals(howMany, javaRecs.length);
    +    for (int i = 1; i < javaRecs.length; i++) {
    +      Assert.assertTrue(javaRecs[i-1]._2() > javaRecs[i]._2());
    +    }
    +    // Pretty safe bet!
    +    Assert.assertTrue(javaRecs[0]._2() > 0.7);
    --- End diff --
    
    We didn't fix seed. It may be safer to set the seed so everything is deterministic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1687#discussion_r15657282
  
    --- Diff: mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java ---
    @@ -163,12 +173,42 @@ public void runImplicitALSWithNegativeWeight() {
         int iterations = 15;
         int users = 80;
         int products = 160;
    -    scala.Tuple3<List<Rating>, DoubleMatrix, DoubleMatrix> testData = ALSSuite.generateRatingsAsJavaList(
    +    Tuple3<List<Rating>, DoubleMatrix, DoubleMatrix> testData = ALSSuite.generateRatingsAsJavaList(
             users, products, features, 0.7, true, true);
     
         JavaRDD<Rating> data = sc.parallelize(testData._1());
    -    MatrixFactorizationModel model = ALS.trainImplicit(data.rdd(), features, iterations);
    +    MatrixFactorizationModel model = new ALS().setRank(features)
    +        .setIterations(iterations)
    --- End diff --
    
    two space indentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1687#issuecomment-50789070
  
    Cool, I committed some updates to address these points. The change to persistence and partitions is something you want to do separately? Yes I agree these should be cached in memory if possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1687#issuecomment-50789198
  
    QA tests have started for PR 1687. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17588/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1687#discussion_r15657110
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala ---
    @@ -66,6 +66,44 @@ class MatrixFactorizationModel private[mllib] (
       }
     
       /**
    +   * Recommends products to a user.
    +   *
    +   * @param user the user to recommend products to
    +   * @param num how many products to return. The number returned may be less than this.
    +   * @return product ID and score tuples, sorted descending by score. The first product returned
    --- End diff --
    
    The doc is no longer accurate because we switched to `Array[Rating]`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1687#discussion_r15645878
  
    --- Diff: mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java ---
    @@ -171,4 +180,29 @@ public void runImplicitALSWithNegativeWeight() {
         validatePrediction(model, users, products, features, testData._2(), 0.4, true, testData._3());
       }
     
    +  @Test
    +  public void runRecommend() {
    +    int features = 5;
    +    int iterations = 10;
    +    int users = 200;
    +    int products = 50;
    +    Tuple3<List<Rating>, DoubleMatrix, DoubleMatrix> testData = ALSSuite.generateRatingsAsJavaList(
    +        users, products, features, 0.7, true, false);
    +    JavaRDD<Rating> data = sc.parallelize(testData._1());
    +    MatrixFactorizationModel model = ALS.trainImplicit(data.rdd(), features, iterations);
    +    validateRecommendations(model.recommendProducts(1, 10), 10);
    +    validateRecommendations(model.recommendUsers(1, 20), 20);
    +  }
    +
    +  private static void validateRecommendations(Tuple2<Object,Object>[] recommendations, int howMany) {
    +    @SuppressWarnings("unchecked")
    +    Tuple2<Integer,Double>[] javaRecs = (Tuple2<Integer,Double>[]) (Object[]) recommendations;
    +    Assert.assertEquals(howMany, javaRecs.length);
    +    for (int i = 1; i < javaRecs.length; i++) {
    +      Assert.assertTrue(javaRecs[i-1]._2() > javaRecs[i]._2());
    --- End diff --
    
    nit: `>=`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1687#issuecomment-50795706
  
    QA results for PR 1687:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17588/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/1687#issuecomment-50766537
  
    @srowen This is great! I was about to add the same function. We only have `RDD.top` in core, but if we have `PairRDDFunctions.topByKey`, we can make recommendation on a small batch of users/products. Another possible optimization is to use more partitions for the final user/product features. Beside some inline comments, could you also set the storage level of the final user/product features to `MEMORY_AND_DISK` in this PR? We saw use cases where the features were kicked out of memory by later jobs on the same cluster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1687


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1687#discussion_r15657314
  
    --- Diff: mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java ---
    @@ -163,12 +173,42 @@ public void runImplicitALSWithNegativeWeight() {
         int iterations = 15;
         int users = 80;
         int products = 160;
    -    scala.Tuple3<List<Rating>, DoubleMatrix, DoubleMatrix> testData = ALSSuite.generateRatingsAsJavaList(
    +    Tuple3<List<Rating>, DoubleMatrix, DoubleMatrix> testData = ALSSuite.generateRatingsAsJavaList(
             users, products, features, 0.7, true, true);
     
         JavaRDD<Rating> data = sc.parallelize(testData._1());
    -    MatrixFactorizationModel model = ALS.trainImplicit(data.rdd(), features, iterations);
    +    MatrixFactorizationModel model = new ALS().setRank(features)
    +        .setIterations(iterations)
    +        .setImplicitPrefs(true)
    +        .setSeed(8675309L)
    +        .run(data.rdd());
         validatePrediction(model, users, products, features, testData._2(), 0.4, true, testData._3());
       }
     
    +  @Test
    +  public void runRecommend() {
    +    int features = 5;
    +    int iterations = 10;
    +    int users = 200;
    +    int products = 50;
    +    Tuple3<List<Rating>, DoubleMatrix, DoubleMatrix> testData = ALSSuite.generateRatingsAsJavaList(
    +        users, products, features, 0.7, true, false);
    +    JavaRDD<Rating> data = sc.parallelize(testData._1());
    +    MatrixFactorizationModel model = new ALS().setRank(features)
    +        .setIterations(iterations)
    --- End diff --
    
    ditto: two space indentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1687#issuecomment-50746870
  
    QA results for PR 1687:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17576/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1687#issuecomment-50775834
  
    Thanks @mengxr I agree with all of that and will update the PR. `Rating` is a good solution; there's a redundant field but very few objects are returned anyway. Sorry I'm being dense but which RDD should be set to `MEMORY_AND_DISK`? the `scored` RDD in my PR? and how would you set partitions?
    
    Yes if there were a topByKey it would be natural to expose a small batch recommend feature here. There are other possible operations here like `mostSimilar` but we can leave that for another PR after discussing what the metric should be -- cosine similarity? etc.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1687#issuecomment-50742995
  
    QA tests have started for PR 1687. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17576/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/1687#issuecomment-50785472
  
    I meant the final `userFeatures` and `productFeatures` stored in the matrix factorization model. If those two RDDs are kicked out from memory by later jobs, we have to restart from the very beginning. Having more partitions can also help lookup. So I'm thinking about changing 
    
    https://github.com/srowen/spark/blob/SPARK-2768/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L287
    
    to
    
    ~~~
        usersOut.setName("usersOut").persist(StorageLevel.MEMORY_AND_DISK)
        productsOut.setName("productsOut").persist(StorageLevel.MEMORY_AND_DISK)
    ~~~
    
    and maybe also make them have more partitions in `unblockFactors` for quick lookup.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/1687#issuecomment-50793121
  
    @srowen For changing the storage level, I can submit another PR after this gets merged and ping you for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/1687#issuecomment-50769991
  
    For the API, another option is to return `Array[Rating]` instead of `Array[(Int, Double)]`. This should help Java users and it is also compatible with batch predictions if we want to add in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1687#discussion_r15645259
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala ---
    @@ -66,6 +66,42 @@ class MatrixFactorizationModel private[mllib] (
       }
     
       /**
    +   * Recommends products to users.
    --- End diff --
    
    `Recommends products to a user.`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2768 [MLLIB] Add product, user recommend...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1687#issuecomment-50873725
  
    QA tests have started for PR 1687. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17672/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---