You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Andrew Ash (JIRA)" <ji...@apache.org> on 2014/11/14 11:05:33 UTC

[jira] [Commented] (SPARK-957) The problem that repeated computation among iterations

    [ https://issues.apache.org/jira/browse/SPARK-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14212084#comment-14212084 ] 

Andrew Ash commented on SPARK-957:
----------------------------------

Hi [~caizhua], are you still having issues with your implementation of the LDA algorithm?  We try to only keep tickets open that have remaining work to be done.

You can also reference SPARK-1405 for the LDA implementation being worked on for future inclusion into MLlib.

> The problem that repeated computation among iterations
> ------------------------------------------------------
>
>                 Key: SPARK-957
>                 URL: https://issues.apache.org/jira/browse/SPARK-957
>             Project: Spark
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 0.7.3
>            Reporter: caizhua
>
> For LDA model, if we make each document as a single record of RDD, it is quite slow, so we try making the RDD as a set of blocks, where each block has a subset of documents. However, when we run the program, we find that a lot of computation among iterations are repeated. Basically, when we comes to the ith iteration, all the jobs that happened in 0 to (i-1)th iteration are repeated. Certainly, the jobs in the ith iteration will be repeated in the (i+1) iteration. In total, if you have m iterations, then the jobs in the ith iteration will be repeated.
> However, the result is still correct. :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org