You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Jerry Lam (JIRA)" <ji...@apache.org> on 2018/06/25 20:45:00 UTC

[jira] [Created] (SPARK-24652) Strange ALS Implementation for Implicit Feedback

Jerry Lam created SPARK-24652:
---------------------------------

             Summary: Strange ALS Implementation for Implicit Feedback
                 Key: SPARK-24652
                 URL: https://issues.apache.org/jira/browse/SPARK-24652
             Project: Spark
          Issue Type: Bug
          Components: ML
    Affects Versions: 2.3.1
            Reporter: Jerry Lam


Hi there,

I'm evaluating the ALS implementation from Spark ML. Does Spark implement the algorithm described in "Collaborative Filtering for Implicit Feedback Datasets"? because if it is, I think the implementation returns result that is incorrect.

Here is the example:
{code:java}
from pyspark.ml.recommendation import ALS

als = ALS(
    maxIter=100,
    regParam=0.0,
    alpha=1.0,
    nonnegative=False,
    implicitPrefs=True,
    rank=1)


ratings = spark.createDataFrame([(0, 0, 1), (1,1, 1)]).toDF('user', 'item', 'rating')
als_model = als.fit(ratings)
reco = als_model.recommendForAllUsers(10)
reco.show(truncate=False)
{code}
 The result is:
{code:java}
+----+---------------------------------+ 
|user|recommendations |
+----+---------------------------------+ 
|0 |[[0, 0.6666667], [1, -0.6666667]]|
|1 |[[1, 0.6666667], [0, -0.6666667]]| 
+----+---------------------------------+
{code}
 I expect the results for the above to be :
{code:java}
+----+---------------------------------+ 
|user|recommendations | 
+----+---------------------------------+ 
|0 |[[0, 1.0], [1, -1.0]]|
|1 |[[1, 1.0], [0, -1.0]]| 
+----+---------------------------------+
{code}
The reason I believe that it should be equal to 1.0 for (user=1, item=1) and 1.0 for (user=0, item=0) is because from the paper, the above should return 1.0 this two cases given that lambda is 0.0 (no regularization). 

 

Can someone describe what implementation of implicit feedback is spark using? If it implemented the same paper, why the result is so different? Thank you.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org