You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:16:51 UTC

[jira] [Resolved] (SPARK-18599) Add the Spectral LDA algorithm

     [ https://issues.apache.org/jira/browse/SPARK-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-18599.
----------------------------------
    Resolution: Incomplete

> Add the Spectral LDA algorithm
> ------------------------------
>
>                 Key: SPARK-18599
>                 URL: https://issues.apache.org/jira/browse/SPARK-18599
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Jencir Lee
>            Priority: Major
>              Labels: bulk-closed, lda
>
> The Spectral LDA algorithm transforms the LDA problem to an orthogonal tensor decomposition problem. [[Anandkumar 2012]] establishes theoretical guarantee for the convergence of orthogonal tensor decomposition. 
> This algorithm first builds 2nd-order, 3rd-order moments from the empirical word counts, orthogonalize them and finally perform the tensor decomposition on the empirical data moments. The whole procedure is purely linear and could leverage machine native BLAS/LAPACK libraries (the Spark needs to be compiled with `-Pnetlib-lgpl` option).
> It achieves competitive log-perplexity vs Online Variational Inference in the shortest time. It also has clean memory usage -- as of v2.0.0 we've experienced crash due to memory problem with the built-in Gibbs Sampler or Online Variational Inference, but never with the Spectral LDA algorithm. This algorithm is linearly scalable. 
> The original repo is at https://github.com/FurongHuang/SpectralLDA-TensorSpark. We refactored for the Spark coding style and interfaces when porting over for the PR. We wrote a report describing the algorithm in detail and listing test results at https://www.overleaf.com/read/wscdvwrjmtmw. It's going to enter our official repo soon.
> REFERENCES
> Anandkumar, Anima, et al., Tensor decompositions for learning latent variable models, 2012, https://arxiv.org/abs/1210.7559.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org