You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mathieu D (JIRA)" <ji...@apache.org> on 2017/03/28 20:27:41 UTC
[jira] [Comment Edited] (SPARK-20082) Incremental update of LDA
model, by adding initialModel as start point
[ https://issues.apache.org/jira/browse/SPARK-20082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945892#comment-15945892 ]
Mathieu D edited comment on SPARK-20082 at 3/28/17 8:27 PM:
------------------------------------------------------------
[~yuhaoyan] would you mind having a look to this PR ? Right now, I added an initialModel, suported only by the Online optimizer.
Regarding the EM optimizer, I could add new doc vertices and new doc->term edges to the existing graph. But it's unclear for me how the new doc vertices should be weighted when added. Right now for a new model, docs and terms vertices are weighted randomly, with the same total weight on docs and terms. If I add new docs to an existing graph, how to initialize the weights on this side ?
was (Author: mathieude):
[~yuhaoyan] would you mind having a look to this PR ? Right now, I added an initialModel only for the Online optimizer.
Regarding the EM optimizer, I could add new doc vertices and new doc->term edges to the existing graph. But it's unclear for me how the new doc vertices should be weighted when added. Right now for a new model, docs and terms vertices are weighted randomly, with the same total weight on docs and terms. If I add new docs to an existing graph, how to initialize the weights on this side ?
> Incremental update of LDA model, by adding initialModel as start point
> ----------------------------------------------------------------------
>
> Key: SPARK-20082
> URL: https://issues.apache.org/jira/browse/SPARK-20082
> Project: Spark
> Issue Type: New Feature
> Components: ML
> Affects Versions: 2.1.0
> Reporter: Mathieu D
>
> Some mllib models support an initialModel to start from and update it incrementally with new data.
> From what I understand of OnlineLDAOptimizer, it is possible to incrementally update an existing model with batches of new documents.
> I suggest to add an initialModel as a start point for LDA.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org