You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2014/08/27 19:37:58 UTC

[jira] [Resolved] (SPARK-953) Latent Dirichlet Association (LDA model)

     [ https://issues.apache.org/jira/browse/SPARK-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiangrui Meng resolved SPARK-953.
---------------------------------

    Resolution: Duplicate

> Latent Dirichlet Association (LDA model)
> ----------------------------------------
>
>                 Key: SPARK-953
>                 URL: https://issues.apache.org/jira/browse/SPARK-953
>             Project: Spark
>          Issue Type: Story
>          Components: Examples
>    Affects Versions: 0.7.3
>            Reporter: caizhua
>            Priority: Critical
>
> This code is for learning the LDA model. However, if our input is 2.5 M documents per machine, a dictionary with 10000 words, running in EC2 m2.4xlarge instance with 68 G memory each machine. The time is really really slow. For five iterations, the time cost is 8145, 24725, 51688, 58674, 56850 seconds. The time for shuffling is quite slow. The LDA.tbl is the simulated data set for the program, and it is quite fast.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org