You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 05:35:37 UTC

[jira] [Updated] (SPARK-3735) Sending the factor directly or AtA based on the cost in ALS

     [ https://issues.apache.org/jira/browse/SPARK-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-3735:
--------------------------------
    Labels: bulk-closed  (was: )

> Sending the factor directly or AtA based on the cost in ALS
> -----------------------------------------------------------
>
>                 Key: SPARK-3735
>                 URL: https://issues.apache.org/jira/browse/SPARK-3735
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>            Priority: Major
>              Labels: bulk-closed
>
> It is common to have some super popular products in the dataset. In this case, sending many user factors to the target product block could be more expensive than sending the normal equation `\sum_i u_i u_i^T` and `\sum_i u_i r_ij` to the product block. The cost of sending a single factor is `k`, while the cost of sending a normal equation is much more expensive, `k * (k + 3) / 2`. However, if we use normal equation for all products associated with a user, we don't need to send this user factor.
> Determining the optimal assignment is hard. But we could use a simple heuristic. Inside any rating block,
> 1) order the product ids by the number of user ids associated with them in desc order
> 2) starting from the most popular product, mark popular products as "use normal eq" and calculate the cost
> Remember the best assignment that comes with the lowest cost and use it for computation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org