You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/09/23 11:12:20 UTC

[jira] [Commented] (FLINK-4613) Extend ALS to handle implicit feedback datasets

    [ https://issues.apache.org/jira/browse/FLINK-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15516142#comment-15516142 ] 

ASF GitHub Bot commented on FLINK-4613:
---------------------------------------

GitHub user gaborhermann opened a pull request:

    https://github.com/apache/flink/pull/2542

    [FLINK-4613] Extend ALS to handle implicit feedback datasets

    This extension of the ALS algorithm changes some parts of the code if `implicitPrefs` flag is set to true. Mainly the local parts parts are changed: the `Xt * X` computation takes into consideration the confidence, thus computing `Xt * (C - I) * X` instead (see the paper by Hu et al. for details). The `Xt * X` matrix is precomputed and broadcasted, and that is the only thing that affects distributed execution.
    
    Note, that we use a temporary directory in the test, because there would not be enough memory segments to perform a hash join for prediction. I assume that memory segments are not freed up after the training if no temporary directory is set, but I did not investigate the issue as using a tempdir is a simple workaround.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gaborhermann/flink ials

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2542.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2542
    
----
commit 84d338b11f77b20fa1825029f8ca847a40eb4673
Author: Gábor Hermann <co...@gaborhermann.com>
Date:   2016-09-12T09:47:40Z

    [FLINK-4613] Compute XtX for IALS & test, docs

commit 8e7c0d67a6f0390f03765fcdc9e03f3c391807cd
Author: jfeher <fe...@gmail.com>
Date:   2016-09-12T09:57:44Z

    [FLINK-4613] Extend ALS for implicit case
    
    XtX matrix precomputation is not yet done.

----


> Extend ALS to handle implicit feedback datasets
> -----------------------------------------------
>
>                 Key: FLINK-4613
>                 URL: https://issues.apache.org/jira/browse/FLINK-4613
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Gábor Hermann
>            Assignee: Gábor Hermann
>
> The Alternating Least Squares implementation should be extended to handle _implicit feedback_ datasets. These datasets do not contain explicit ratings by users, they are rather built by collecting user behavior (e.g. user listened to artist X for Y minutes), and they require a slightly different optimization objective. See details by [Hu et al|http://dx.doi.org/10.1109/ICDM.2008.22].
> We do not need to modify much in the original ALS algorithm. See [Spark ALS implementation|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala], which could be a basis for this extension. Only the updating factor part is modified, and most of the changes are in the local parts of the algorithm (i.e. UDFs). In fact, the only modification that is not local, is precomputing a matrix product Y^T * Y and broadcasting it to all the nodes, which we can do with broadcast DataSets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)