You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph K. Bradley (JIRA)" <ji...@apache.org> on 2016/11/28 19:37:58 UTC

[jira] [Comment Edited] (SPARK-18599) Add the Spectral LDA algorithm

    [ https://issues.apache.org/jira/browse/SPARK-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702885#comment-15702885 ] 

Joseph K. Bradley edited comment on SPARK-18599 at 11/28/16 7:36 PM:
---------------------------------------------------------------------

It would be great to test this as a Spark Package first; that will let us collect feedback from users to get a better idea of whether it should be put in MLlib itself.  Feel free to link the package from this JIRA, and to use this JIRA for users to post results.

(Also, please let committers set the "Target Version" and "Shepherd" fields.)

Thanks!


was (Author: josephkb):
It would be great to test this as a Spark Package first; that will let us collect feedback from users to get a better idea of whether it should be put in MLlib itself.  Feel free to link the package from this JIRA, and to use this JIRA for users to post results.

(Also, please let committers set the "Target Version" field.)

Thanks!

> Add the Spectral LDA algorithm
> ------------------------------
>
>                 Key: SPARK-18599
>                 URL: https://issues.apache.org/jira/browse/SPARK-18599
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Jencir Lee
>              Labels: lda
>
> The Spectral LDA algorithm transforms the LDA problem to an orthogonal tensor decomposition problem. [[Anandkumar 2012]] establishes theoretical guarantee for the convergence of orthogonal tensor decomposition. 
> This algorithm first builds 2nd-order, 3rd-order moments from the empirical word counts, orthogonalize them and finally perform the tensor decomposition on the empirical data moments. The whole procedure is purely linear and could leverage machine native BLAS/LAPACK libraries (the Spark needs to be compiled with `-Pnetlib-lgpl` option).
> It achieves competitive log-perplexity vs Online Variational Inference in the shortest time. It also has clean memory usage -- as of v2.0.0 we've experienced crash due to memory problem with the built-in Gibbs Sampler or Online Variational Inference, but never with the Spectral LDA algorithm. This algorithm is linearly scalable. 
> The original repo is at https://github.com/FurongHuang/SpectralLDA-TensorSpark. We refactored for the Spark coding style and interfaces when porting over for the PR. We wrote a report describing the algorithm in detail and listing test results at https://www.overleaf.com/read/wscdvwrjmtmw. It's going to enter our official repo soon.
> REFERENCES
> Anandkumar, Anima, et al., Tensor decompositions for learning latent variable models, 2012, https://arxiv.org/abs/1210.7559.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org