You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "zhengbing li (JIRA)" <ji...@apache.org> on 2014/06/24 14:08:24 UTC
[jira] [Updated] (SPARK-2257) The algorithm of ALS in mlib lacks a
parameter
[ https://issues.apache.org/jira/browse/SPARK-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhengbing li updated SPARK-2257:
--------------------------------
Description:
When I test ALS algorithm using netflix data, I find I cannot get the acurate results declared by the paper. The best MSE value is 0.9066300038109709(RMSE 0.952), which is worse than the paper's result. If I increase the number of features or the number of iterations, I will get a worse result. After I studing the paper and source code, I find a bug in the updateBlock function of ALS.
orgin code is:
while (i < rank) {
// ---
fullXtX.data(i * rank + i) += lambda
i += 1
}
The code doesn't consider the number of products that one user rates. So this code should be modified:
while (i < rank) {
//ratingsNum(index) equals the number of products that a user rates
fullXtX.data(i * rank + i) += lambda * ratingsNum(index)
i += 1
}
After I modify code, the MSE value has been decreased, this is one test result
conditions:
val numIterations =20
val features = 30
val model = ALS.train(trainRatings,features, numIterations, 0.06)
result of modified version:
MSE: Double = 0.8472313396478773
RMSE: 0.92045
results of version of 1.0
MSE: Double = 1.2680743123043832
RMSE: 1.1261
In order to add the vector ratingsNum, I want to change the InLinkBlock structure as follows:
private[recommendation] case class InLinkBlock(elementIds: Array[Int], ratingsNum:Array[Int], ratingsForBlock: Array[Array[(Array[Int], Array[Double])]])
So I could calculte the vector ratingsNum in the function of makeInLinkBlock. This is the code I add in the makeInLinkBlock:
...........
//added
val ratingsNum = new Array[Int](numUsers)
ratings.map(r => ratingsNum(userIdToPos(r.user)) += 1)
//end of added
InLinkBlock(userIds, ratingsNum, ratingsForBlock)
........
Is this solution reasonable??
was:
When I test ALS algorithm using netflix data, I find I cannot get the acurate results declared by the paper. The best MSE value is 0.9066300038109709(RMSE 0.952), which is worse than the paper's result. If I increase the number of features or the number of iterations, I will get a worse result. After I studing the paper and source code, I find a bug in the updateBlock function of ALS.
orgin code is:
while (i < rank) {
// ---
fullXtX.data(i * rank + i) += lambda
i += 1
}
The code doesn't consider the number of products that one user rates. So this code should be modified:
while (i < rank) {
//ratingsNum(index) equals the number of products that a user rates
fullXtX.data(i * rank + i) += lambda * ratingsNum(index)
i += 1
}
After I modify code, the MSE value has been improved, this is one test result
conditions:
val numIterations =20
val features = 30
val model = ALS.train(trainRatings,features, numIterations, 0.06)
result of modified version:
MSE: Double = 0.8472313396478773
RMSE: 0.92045
results of version of 1.0
MSE: Double = 1.2680743123043832
RMSE: 1.1261
In order to add the vector ratingsNum, I want to change the InLinkBlock structure as follows:
private[recommendation] case class InLinkBlock(elementIds: Array[Int], ratingsNum:Array[Int], ratingsForBlock: Array[Array[(Array[Int], Array[Double])]])
So I could calculte the vector ratingsNum in the function of makeInLinkBlock. This is the code I add in the makeInLinkBlock:
...........
//added
val ratingsNum = new Array[Int](numUsers)
ratings.map(r => ratingsNum(userIdToPos(r.user)) += 1)
//end of added
InLinkBlock(userIds, ratingsNum, ratingsForBlock)
........
Is this solution reasonable??
> The algorithm of ALS in mlib lacks a parameter
> -----------------------------------------------
>
> Key: SPARK-2257
> URL: https://issues.apache.org/jira/browse/SPARK-2257
> Project: Spark
> Issue Type: Bug
> Components: MLlib
> Affects Versions: 1.0.0
> Environment: spark 1.0
> Reporter: zhengbing li
> Labels: patch
> Fix For: 1.1.0
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> When I test ALS algorithm using netflix data, I find I cannot get the acurate results declared by the paper. The best MSE value is 0.9066300038109709(RMSE 0.952), which is worse than the paper's result. If I increase the number of features or the number of iterations, I will get a worse result. After I studing the paper and source code, I find a bug in the updateBlock function of ALS.
> orgin code is:
> while (i < rank) {
> // ---
> fullXtX.data(i * rank + i) += lambda
> i += 1
> }
> The code doesn't consider the number of products that one user rates. So this code should be modified:
> while (i < rank) {
>
> //ratingsNum(index) equals the number of products that a user rates
> fullXtX.data(i * rank + i) += lambda * ratingsNum(index)
> i += 1
> }
> After I modify code, the MSE value has been decreased, this is one test result
> conditions:
> val numIterations =20
> val features = 30
> val model = ALS.train(trainRatings,features, numIterations, 0.06)
> result of modified version:
> MSE: Double = 0.8472313396478773
> RMSE: 0.92045
> results of version of 1.0
> MSE: Double = 1.2680743123043832
> RMSE: 1.1261
> In order to add the vector ratingsNum, I want to change the InLinkBlock structure as follows:
> private[recommendation] case class InLinkBlock(elementIds: Array[Int], ratingsNum:Array[Int], ratingsForBlock: Array[Array[(Array[Int], Array[Double])]])
> So I could calculte the vector ratingsNum in the function of makeInLinkBlock. This is the code I add in the makeInLinkBlock:
> ...........
> //added
> val ratingsNum = new Array[Int](numUsers)
> ratings.map(r => ratingsNum(userIdToPos(r.user)) += 1)
> //end of added
> InLinkBlock(userIds, ratingsNum, ratingsForBlock)
> ........
> Is this solution reasonable??
--
This message was sent by Atlassian JIRA
(v6.2#6252)