You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Nan Zhu <zh...@gmail.com> on 2014/01/08 16:38:15 UTC
confusion on RDD usage in MatrixFactorizationModel (master
branch)
Hi, all
I’m reading the source code of master branch
there is a new predict() function in MatrixFactorizationModel
/**
* Predict the rating of many users for many products.
* The output RDD has an element per each element in the input RDD (including all duplicates)
* unless a user or product is missing in the training set.
*
* @param usersProducts RDD of (user, product) pairs.
* @return RDD of Ratings.
*/
def predict(usersProducts: RDD[(Int, Int)]): RDD[Rating] = {
val users = userFeatures.join(usersProducts).map{
case (user, (uFeatures, product)) => (product, (user, uFeatures))
}
users.join(productFeatures).map {
case (product, ((user, uFeatures), pFeatures)) =>
val userVector = new DoubleMatrix(uFeatures)
val productVector = new DoubleMatrix(pFeatures)
Rating(user, product, userVector.dot(productVector))
}
}
it seems that the author can directly call join with a RDD object?
It’s a new feature in next version? I’m used to creating a PairRDDFunctions with the current RDD and then calls join, etc.
Did I misunderstand something?
Best,
--
Nan Zhu
Re: confusion on RDD usage in MatrixFactorizationModel (master
branch)
Posted by Nan Zhu <zh...@gmail.com>.
ignore that
These operations are
* automatically available on any RDD of the right type (e.g. RDD[(Int, Int)] through implicit
* conversions when you `import org.apache.spark.SparkContext._`.
--
Nan Zhu
On Wednesday, January 8, 2014 at 10:38 AM, Nan Zhu wrote:
> Hi, all
>
> I’m reading the source code of master branch
>
> there is a new predict() function in MatrixFactorizationModel
>
> /**
> * Predict the rating of many users for many products.
> * The output RDD has an element per each element in the input RDD (including all duplicates)
> * unless a user or product is missing in the training set.
> *
> * @param usersProducts RDD of (user, product) pairs.
> * @return RDD of Ratings.
> */
> def predict(usersProducts: RDD[(Int, Int)]): RDD[Rating] = {
> val users = userFeatures.join(usersProducts).map{
> case (user, (uFeatures, product)) => (product, (user, uFeatures))
> }
> users.join(productFeatures).map {
> case (product, ((user, uFeatures), pFeatures)) =>
> val userVector = new DoubleMatrix(uFeatures)
> val productVector = new DoubleMatrix(pFeatures)
> Rating(user, product, userVector.dot(productVector))
> }
> }
>
>
> it seems that the author can directly call join with a RDD object?
>
> It’s a new feature in next version? I’m used to creating a PairRDDFunctions with the current RDD and then calls join, etc.
>
> Did I misunderstand something?
>
> Best,
>
> --
> Nan Zhu
>