You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MLnick <gi...@git.apache.org> on 2017/05/09 08:43:23 UTC

[GitHub] spark pull request #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recomme...

GitHub user MLnick opened a pull request:

    https://github.com/apache/spark/pull/17919

    [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all performance PRs

    Small clean ups from #17742 and #17845.
    
    ## How was this patch tested?
    
    Existing unit tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MLnick/spark SPARK-20677-als-perf-followup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17919.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17919
    
----
commit 8a6073f976021efe1e5a1d17c3afddf5e98494a1
Author: Nick Pentreath <ni...@za.ibm.com>
Date:   2017-05-09T08:05:03Z

    Use F2jBLAS and clean up code

commit 301e8b89691effc065a128b4eb0569e421810189
Author: Nick Pentreath <ni...@za.ibm.com>
Date:   2017-05-09T08:23:48Z

    mllib version

commit 0b1eaa34c370bfae7d83190a43d84fae1dc69eb8
Author: Nick Pentreath <ni...@za.ibm.com>
Date:   2017-05-09T08:39:25Z

    No need for 'recommendation' private scope

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76792/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by auskalia <gi...@git.apache.org>.
Github user auskalia commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    Hi, @MLnick, We find that just do repartition for userFeatures and productFeatures can improve the efficiency significantly on the ALS recommendForAll().
    
    Here is our procedure:
    1. Train ALS model
    2. Save model as hdfs file
    3. Submit new spark mission
    4. Load model from hdfs file
    5. do recommendForAll()
    
    Firstly, when you submit spark mission with "spark.default.parallelism=x" the stage for recommendForAll will be splited  the number of x^2 tasks, due to the partition number of userFeatures is equal to x and productFeatures number is equal to x. This is not reasonable. Too much network I/O operation to finish the stage.
    
    Secondly, submitting spark mission with "spark.dynamicAllocation.enabled=true" may cause data uneven distribution on executors. We found that some executors may take n GB data(who start early), but others may just take m MB data(who start later). This may cause a few executors execute tasks slowly with high GC or crash by OOM.
    
    We did some test to repartition on the userFeatures and productFeatures. Here is it.
    
    case 1:
    users: 480 thousand, products: 4 million, rank 25
    executors: 600, default.parallelism: 100, executor-memory: 20G, executor-cores: 8
    without repartition, recommendforall spent 24min
    after repartition, userFeatures.repartition(100), productFeatures.repartition(100) , recommendforall spent 8min
    result: 3x faster
    
    case 2:
    users: 12 million, products: 7.2 million, rank 20
    executors: 800, default.parallelism: 600, executor-memory: 16G, executor-cores: 8
    without repartition, recommendforall spent 16 hours
    after repartition, userFeatures.repartition(800), productFeatures.repartition(100) recommendforall spent 30 mins
    result: 32x faster
    
    Note that the partition number of userFeatures and productFeatures may be different.
    
    Above test based on the fix #17742 and #17845.
    
    We strongly suggest that provide interface to user to have a chance to do re-partition for 2 kinds of features.
    
    Thanks
    
    Here is the patch for mllib, with 2 new public function of MatrixFactorizationModel
    
    diff --git a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
    index d45866c..d4412f7 100644
    --- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
    +++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
    @@ -56,8 +56,8 @@ import org.apache.spark.util.BoundedPriorityQueue
     @Since("0.8.0")
     class MatrixFactorizationModel @Since("0.8.0") (
         @Since("0.8.0") val rank: Int,
    -    @Since("0.8.0") val userFeatures: RDD[(Int, Array[Double])],
    -    @Since("0.8.0") val productFeatures: RDD[(Int, Array[Double])])
    +    @Since("0.8.0") var userFeatures: RDD[(Int, Array[Double])],
    +    @Since("0.8.0") var productFeatures: RDD[(Int, Array[Double])])
       extends Saveable with Serializable with Logging {
     
       require(rank > 0)
    @@ -154,6 +154,39 @@ class MatrixFactorizationModel @Since("0.8.0") (
         predict(usersProducts.rdd.asInstanceOf[RDD[(Int, Int)]]).toJavaRDD()
       }
     
    +
    +  /**
    +    * Repartition UserFeatures
    +    * @param partitionNum the value you want to do reparition on the userFeatures in Model
    +    */
    +  @Since("2.2.0")
    +  def repartitionUserFeatures(partitionNum: Int = 0): Unit =
    +  {
    +    if (partitionNum > 0)
    +    {
    +        userFeatures = userFeatures.repartition(partitionNum)
    +    }
    +    else
    +    {
    +        userFeatures = userFeatures.repartition(userFeatures.getNumPartitions)
    +    }
    +  }
    +  /**
    +    * Repartition ProductFeatures
    +    * @param partitionNum the value you want to do reparition on the ProductFeatures in Model
    +    */
    +  @Since("2.2.0")
    +  def repartitionProductFeatures(partitionNum: Int = 0): Unit =
    +  {
    +    if (partitionNum > 0)
    +    {
    +      productFeatures = productFeatures.repartition(partitionNum)
    +    }
    +    else
    +    {
    +      productFeatures = productFeatures.repartition(productFeatures.getNumPartitions)
    +    }
    +  }
       /**
        * Recommends products to a user.
        *
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76801/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    **[Test build #76665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76665/testReport)** for PR 17919 at commit [`0b1eaa3`](https://github.com/apache/spark/commit/0b1eaa34c370bfae7d83190a43d84fae1dc69eb8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76665/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    taking a look


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    **[Test build #76801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76801/testReport)** for PR 17919 at commit [`9dfad1b`](https://github.com/apache/spark/commit/9dfad1bffe30163eab5a42eeda3ec1ec38783168).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by auskalia <gi...@git.apache.org>.
Github user auskalia commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    Hi @mpjlu , your are right. But I consider that sometimes we have to use several spark mission to finish our work, especially the resource is insufficient in hadoop cluster. Due to save and reload file in different mission is a common method for engineering application. So I recommend to export an interface to try do  repartition features for client. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recomme...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17919#discussion_r115692011
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala ---
    @@ -451,6 +439,8 @@ class ALSModel private[ml] (
     @Since("1.6.0")
     object ALSModel extends MLReadable[ALSModel] {
     
    +  @transient private[recommendation] val _f2jBLAS = new F2jBLAS
    --- End diff --
    
    No more or less than using `ml.linalg.BLAS` - I did think of that but the `var` needs to be exposed as `private[ml]`. If we're ok with that then it'll be slightly cleaner to use that, yes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recomme...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17919#discussion_r115632945
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala ---
    @@ -451,6 +439,8 @@ class ALSModel private[ml] (
     @Since("1.6.0")
     object ALSModel extends MLReadable[ALSModel] {
     
    +  @transient private[recommendation] val _f2jBLAS = new F2jBLAS
    --- End diff --
    
    Does this require significant initialization?  You could use org.apache.spark.ml.linalg.BLAS.f2jBLAS


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    **[Test build #76792 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76792/testReport)** for PR 17919 at commit [`9dfad1b`](https://github.com/apache/spark/commit/9dfad1bffe30163eab5a42eeda3ec1ec38783168).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    Just decided to use `ml.BLAS` and expose `f2jBLAS` as `m / mllib private`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    **[Test build #76801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76801/testReport)** for PR 17919 at commit [`9dfad1b`](https://github.com/apache/spark/commit/9dfad1bffe30163eab5a42eeda3ec1ec38783168).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by mpjlu <gi...@git.apache.org>.
Github user mpjlu commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    Thanks, I am ok for this change. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    cc @mpjlu @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recomme...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17919


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    **[Test build #76792 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76792/testReport)** for PR 17919 at commit [`9dfad1b`](https://github.com/apache/spark/commit/9dfad1bffe30163eab5a42eeda3ec1ec38783168).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    Merged to master/branch-2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    **[Test build #76665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76665/testReport)** for PR 17919 at commit [`0b1eaa3`](https://github.com/apache/spark/commit/0b1eaa34c370bfae7d83190a43d84fae1dc69eb8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...

Posted by mpjlu <gi...@git.apache.org>.
Github user mpjlu commented on the issue:

    https://github.com/apache/spark/pull/17919
  
    Hi @auskalia , you are right. repartition can improve the performance of recommendForAll. 
    In my experiment for PR 17742, I have 120 cores, I use 20 partition for userFeatures, and itemFeatures. 
    I also consider to provide interface to user to have a chance to do re-partition.  
    Since you can set the partition number when train the model, I did not do that. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org