You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by jk...@apache.org on 2015/05/13 00:12:34 UTC

spark git commit: [SPARK-7496] [MLLIB] Update Programming guide with Online LDA

Repository: spark
Updated Branches:
  refs/heads/master 1422e79e5 -> 1d703660d


[SPARK-7496] [MLLIB] Update Programming guide with Online LDA

jira: https://issues.apache.org/jira/browse/SPARK-7496

Update LDA subsection of clustering section of MLlib programming guide to include OnlineLDA.

Author: Yuhao Yang <hh...@gmail.com>

Closes #6046 from hhbyyh/ldaDocument and squashes the following commits:

4b6fbfa [Yuhao Yang] add online paper and some comparison
fd4c983 [Yuhao Yang] update lda document for optimizers


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1d703660
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1d703660
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1d703660

Branch: refs/heads/master
Commit: 1d703660d4d14caea697affdf31170aea44c8903
Parents: 1422e79
Author: Yuhao Yang <hh...@gmail.com>
Authored: Tue May 12 15:12:29 2015 -0700
Committer: Joseph K. Bradley <jo...@databricks.com>
Committed: Tue May 12 15:12:29 2015 -0700

----------------------------------------------------------------------
 docs/mllib-clustering.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/1d703660/docs/mllib-clustering.md
----------------------------------------------------------------------
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index f5aa15b..f41ca70 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -377,11 +377,11 @@ LDA can be thought of as a clustering algorithm as follows:
  on a statistical model of how text documents are generated.
 
 LDA takes in a collection of documents as vectors of word counts.
-It learns clustering using [expectation-maximization](http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm)
-on the likelihood function. After fitting on the documents, LDA provides:
+It supports different inference algorithms via `setOptimizer` function. EMLDAOptimizer learns clustering using [expectation-maximization](http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm)
+on the likelihood function and yields comprehensive results, while OnlineLDAOptimizer uses iterative mini-batch sampling for [online variational inference](https://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010b.pdf) and is generally memory friendly. After fitting on the documents, LDA provides:
 
 * Topics: Inferred topics, each of which is a probability distribution over terms (words).
-* Topic distributions for documents: For each document in the training set, LDA gives a probability distribution over topics.
+* Topic distributions for documents: For each document in the training set, LDA gives a probability distribution over topics. (EM only)
 
 LDA takes the following parameters:
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org