You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MLnick <gi...@git.apache.org> on 2017/05/09 08:43:23 UTC
[GitHub] spark pull request #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recomme...
GitHub user MLnick opened a pull request:
https://github.com/apache/spark/pull/17919
[SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all performance PRs
Small clean ups from #17742 and #17845.
## How was this patch tested?
Existing unit tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MLnick/spark SPARK-20677-als-perf-followup
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17919.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17919
----
commit 8a6073f976021efe1e5a1d17c3afddf5e98494a1
Author: Nick Pentreath <ni...@za.ibm.com>
Date: 2017-05-09T08:05:03Z
Use F2jBLAS and clean up code
commit 301e8b89691effc065a128b4eb0569e421810189
Author: Nick Pentreath <ni...@za.ibm.com>
Date: 2017-05-09T08:23:48Z
mllib version
commit 0b1eaa34c370bfae7d83190a43d84fae1dc69eb8
Author: Nick Pentreath <ni...@za.ibm.com>
Date: 2017-05-09T08:39:25Z
No need for 'recommendation' private scope
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17919
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76792/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by auskalia <gi...@git.apache.org>.
Github user auskalia commented on the issue:
https://github.com/apache/spark/pull/17919
Hi, @MLnick, We find that just do repartition for userFeatures and productFeatures can improve the efficiency significantly on the ALS recommendForAll().
Here is our procedure:
1. Train ALS model
2. Save model as hdfs file
3. Submit new spark mission
4. Load model from hdfs file
5. do recommendForAll()
Firstly, when you submit spark mission with "spark.default.parallelism=x" the stage for recommendForAll will be splited the number of x^2 tasks, due to the partition number of userFeatures is equal to x and productFeatures number is equal to x. This is not reasonable. Too much network I/O operation to finish the stage.
Secondly, submitting spark mission with "spark.dynamicAllocation.enabled=true" may cause data uneven distribution on executors. We found that some executors may take n GB data(who start early), but others may just take m MB data(who start later). This may cause a few executors execute tasks slowly with high GC or crash by OOM.
We did some test to repartition on the userFeatures and productFeatures. Here is it.
case 1:
users: 480 thousand, products: 4 million, rank 25
executors: 600, default.parallelism: 100, executor-memory: 20G, executor-cores: 8
without repartition, recommendforall spent 24min
after repartition, userFeatures.repartition(100), productFeatures.repartition(100) , recommendforall spent 8min
result: 3x faster
case 2:
users: 12 million, products: 7.2 million, rank 20
executors: 800, default.parallelism: 600, executor-memory: 16G, executor-cores: 8
without repartition, recommendforall spent 16 hours
after repartition, userFeatures.repartition(800), productFeatures.repartition(100) recommendforall spent 30 mins
result: 32x faster
Note that the partition number of userFeatures and productFeatures may be different.
Above test based on the fix #17742 and #17845.
We strongly suggest that provide interface to user to have a chance to do re-partition for 2 kinds of features.
Thanks
Here is the patch for mllib, with 2 new public function of MatrixFactorizationModel
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
index d45866c..d4412f7 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
@@ -56,8 +56,8 @@ import org.apache.spark.util.BoundedPriorityQueue
@Since("0.8.0")
class MatrixFactorizationModel @Since("0.8.0") (
@Since("0.8.0") val rank: Int,
- @Since("0.8.0") val userFeatures: RDD[(Int, Array[Double])],
- @Since("0.8.0") val productFeatures: RDD[(Int, Array[Double])])
+ @Since("0.8.0") var userFeatures: RDD[(Int, Array[Double])],
+ @Since("0.8.0") var productFeatures: RDD[(Int, Array[Double])])
extends Saveable with Serializable with Logging {
require(rank > 0)
@@ -154,6 +154,39 @@ class MatrixFactorizationModel @Since("0.8.0") (
predict(usersProducts.rdd.asInstanceOf[RDD[(Int, Int)]]).toJavaRDD()
}
+
+ /**
+ * Repartition UserFeatures
+ * @param partitionNum the value you want to do reparition on the userFeatures in Model
+ */
+ @Since("2.2.0")
+ def repartitionUserFeatures(partitionNum: Int = 0): Unit =
+ {
+ if (partitionNum > 0)
+ {
+ userFeatures = userFeatures.repartition(partitionNum)
+ }
+ else
+ {
+ userFeatures = userFeatures.repartition(userFeatures.getNumPartitions)
+ }
+ }
+ /**
+ * Repartition ProductFeatures
+ * @param partitionNum the value you want to do reparition on the ProductFeatures in Model
+ */
+ @Since("2.2.0")
+ def repartitionProductFeatures(partitionNum: Int = 0): Unit =
+ {
+ if (partitionNum > 0)
+ {
+ productFeatures = productFeatures.repartition(partitionNum)
+ }
+ else
+ {
+ productFeatures = productFeatures.repartition(productFeatures.getNumPartitions)
+ }
+ }
/**
* Recommends products to a user.
*
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17919
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76801/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17919
**[Test build #76665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76665/testReport)** for PR 17919 at commit [`0b1eaa3`](https://github.com/apache/spark/commit/0b1eaa34c370bfae7d83190a43d84fae1dc69eb8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17919
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76665/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:
https://github.com/apache/spark/pull/17919
taking a look
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17919
**[Test build #76801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76801/testReport)** for PR 17919 at commit [`9dfad1b`](https://github.com/apache/spark/commit/9dfad1bffe30163eab5a42eeda3ec1ec38783168).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by auskalia <gi...@git.apache.org>.
Github user auskalia commented on the issue:
https://github.com/apache/spark/pull/17919
Hi @mpjlu , your are right. But I consider that sometimes we have to use several spark mission to finish our work, especially the resource is insufficient in hadoop cluster. Due to save and reload file in different mission is a common method for engineering application. So I recommend to export an interface to try do repartition features for client.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17919
Jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recomme...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/17919#discussion_r115692011
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala ---
@@ -451,6 +439,8 @@ class ALSModel private[ml] (
@Since("1.6.0")
object ALSModel extends MLReadable[ALSModel] {
+ @transient private[recommendation] val _f2jBLAS = new F2jBLAS
--- End diff --
No more or less than using `ml.linalg.BLAS` - I did think of that but the `var` needs to be exposed as `private[ml]`. If we're ok with that then it'll be slightly cleaner to use that, yes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recomme...
Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:
https://github.com/apache/spark/pull/17919#discussion_r115632945
--- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala ---
@@ -451,6 +439,8 @@ class ALSModel private[ml] (
@Since("1.6.0")
object ALSModel extends MLReadable[ALSModel] {
+ @transient private[recommendation] val _f2jBLAS = new F2jBLAS
--- End diff --
Does this require significant initialization? You could use org.apache.spark.ml.linalg.BLAS.f2jBLAS
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17919
**[Test build #76792 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76792/testReport)** for PR 17919 at commit [`9dfad1b`](https://github.com/apache/spark/commit/9dfad1bffe30163eab5a42eeda3ec1ec38783168).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17919
Just decided to use `ml.BLAS` and expose `f2jBLAS` as `m / mllib private`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17919
**[Test build #76801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76801/testReport)** for PR 17919 at commit [`9dfad1b`](https://github.com/apache/spark/commit/9dfad1bffe30163eab5a42eeda3ec1ec38783168).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by mpjlu <gi...@git.apache.org>.
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17919
Thanks, I am ok for this change.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17919
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17919
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/17919
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17919
cc @mpjlu @jkbradley
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recomme...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/17919
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17919
**[Test build #76792 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76792/testReport)** for PR 17919 at commit [`9dfad1b`](https://github.com/apache/spark/commit/9dfad1bffe30163eab5a42eeda3ec1ec38783168).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:
https://github.com/apache/spark/pull/17919
Merged to master/branch-2.2
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/17919
**[Test build #76665 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76665/testReport)** for PR 17919 at commit [`0b1eaa3`](https://github.com/apache/spark/commit/0b1eaa34c370bfae7d83190a43d84fae1dc69eb8).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #17919: [SPARK-20677][MLLIB][ML] Follow-up to ALS recommend-all ...
Posted by mpjlu <gi...@git.apache.org>.
Github user mpjlu commented on the issue:
https://github.com/apache/spark/pull/17919
Hi @auskalia , you are right. repartition can improve the performance of recommendForAll.
In my experiment for PR 17742, I have 120 cores, I use 20 partition for userFeatures, and itemFeatures.
I also consider to provide interface to user to have a chance to do re-partition.
Since you can set the partition number when train the model, I did not do that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org