You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Zak H <za...@gmail.com> on 2016/11/01 17:00:25 UTC

Question about using collaborative filtering in MLlib

Hi,

I'm using the Java Api for Dataframe api for Spark-Mllib. Should I be using
the RDD api instead as I'm not sure if this functionality has been ported
over to dataframes, correct me if I'm wrong.

My goal is to evaluate spark's recommendation capabilities. I'm looking at
this example:

http://spark.apache.org/docs/latest/ml-collaborative-filtering.html

Looking at the java docs I can see there is a method:
http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.html

"public RDD <http://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html><scala.Tuple2<Object,Rating
<http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/recommendation/Rating.html>[]>>
recommendUsersForProducts(int num)"


For some reason the recommendProductsForUsers method isn't available in the
java api:
model.recommendProductsForUsers

Is there something I'm missing here:

I've posted my code here on this gist. I am using the dataframe api for
mllib. I know there may be work to port over functionality from RDD's.

https://gist.github.com/zmhassan/6ccdda8b4ad86f9b1924477c65ed5d45

Thanks,
Zak

Re: Question about using collaborative filtering in MLlib

Posted by Nick Pentreath <ni...@gmail.com>.
I have a PR for it - https://github.com/apache/spark/pull/12574

Sadly I've been tied up and haven't had a chance to work further on it.

The main issue outstanding is deciding on the transform semantics as well
as performance testing.

Any comments / feedback welcome especially on transform semantics.

N

Re: Question about using collaborative filtering in MLlib

Posted by Yuhao Yang <hh...@gmail.com>.
Hi Zak,

Indeed the function is missing in DataFrame-based API. I can probably
provide some quick prototype to see if it we can merge the function into
next release. I would send update here and feel free to give it a try.

Regards,
Yuhao

2016-11-01 10:00 GMT-07:00 Zak H <za...@gmail.com>:

> Hi,
>
> I'm using the Java Api for Dataframe api for Spark-Mllib. Should I be
> using the RDD api instead as I'm not sure if this functionality has been
> ported over to dataframes, correct me if I'm wrong.
>
> My goal is to evaluate spark's recommendation capabilities. I'm looking
> at this example:
>
> http://spark.apache.org/docs/latest/ml-collaborative-filtering.html
>
> Looking at the java docs I can see there is a method: http://spark.apache.
> org/docs/latest/api/java/org/apache/spark/mllib/recommendation/
> MatrixFactorizationModel.html
>
> "public RDD <http://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html><scala.Tuple2<Object,Rating <http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/recommendation/Rating.html>[]>> recommendUsersForProducts(int num)"
>
>
> For some reason the recommendProductsForUsers method isn't available in
> the java api:
> model.recommendProductsForUsers
>
> Is there something I'm missing here:
>
> I've posted my code here on this gist. I am using the dataframe api for
> mllib. I know there may be work to port over functionality from RDD's.
>
> https://gist.github.com/zmhassan/6ccdda8b4ad86f9b1924477c65ed5d45
>
> Thanks,
> Zak
>