You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "zhengruifeng (JIRA)" <ji...@apache.org> on 2015/04/21 10:31:58 UTC

[jira] [Commented] (SPARK-7008) An implementation of Factorization Machine (LibFM)

    [ https://issues.apache.org/jira/browse/SPARK-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504596#comment-14504596 ] 

zhengruifeng commented on SPARK-7008:
-------------------------------------

I had not considered of the size of model, because the problems which I usualy encounter have dimensionality less than 10 millions. In the situation of higher dimensionality, I think feature hashing may help to limit the number of features (not sure).
The libFM had implemented four training algorithms: SGD, AdaptiveSGD, ALS and MCC. I have only implemented the SGD for regression, and I'm to carry out SGD for binary classification.
In my opinion, SGD is sensitive to the learning rate: big values cause divergency while small cause long-time training.
When coding, I strictly refers to LibFM. There are only two points different: LibFM use strict SGD, I use mini-batch SGD provided by MLlib; LibFM use Learning Rate as a constant, I make it decreasing with the square root of the iteration counter. So I think it's convergence may like LibFM's SGD.
I'm testing the library, and the result will be post in several days.
Thanks.

> An implementation of Factorization Machine (LibFM)
> --------------------------------------------------
>
>                 Key: SPARK-7008
>                 URL: https://issues.apache.org/jira/browse/SPARK-7008
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>    Affects Versions: 1.3.0, 1.3.1, 1.3.2
>            Reporter: zhengruifeng
>              Labels: features, patch
>         Attachments: FM_convergence_rate.xlsx, QQ20150421-1.png, QQ20150421-2.png
>
>
> An implement of Factorization Machines based on Scala and Spark MLlib.
> Factorization Machine is a kind of machine learning algorithm for multi-linear regression, and is widely used for recommendation.
> Factorization Machines works well in recent years' recommendation competitions.
> Ref:
> http://libfm.org/
> http://doi.acm.org/10.1145/2168752.2168771
> http://www.inf.uni-konstanz.de/~rendle/pdf/Rendle2010FM.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org