You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Joseph Bradley <jo...@databricks.com> on 2016/01/27 00:44:33 UTC

Re: Spark LDA model reuse with new set of data

Hi,

This is more a question for the user list, not the dev list, so I'll CC
user.

If you're using mllib.clustering.LDAModel (RDD API), then can you make sure
you're using a LocalLDAModel (or convert to it from DistributedLDAModel)?
You can then call topicDistributions() on the new data.

If you're using ml.clustering.LDAModel (DataFrame API), then you can call
transform() on new data.

Does that work?

Joseph

On Tue, Jan 19, 2016 at 6:21 AM, doruchiulan <do...@gmail.com> wrote:

> Hi,
>
> Just so you know, I am new to Spark, and also very new to ML (this is my
> first contact with ML).
>
> Ok, I am trying to write a DSL where you can run some commands.
>
> I did a command that trains the Spark LDA and it produces the topics I want
> and I saved it using the save method provided by the LDAModel.
>
> Now I want to load this LDAModel and use it to predict on a new set of
> data.
> I call the load method, obtain the LDAModel instance but here I am stuck.
>
> Isnt this possible ? Am I wrong in the way I understood LDA and we cannot
> reuse trained LDA to analyse new data ?
>
> If its possible can you point me to some documentation, or give me a hint
> on
> how should I do that.
>
> Thx
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-LDA-model-reuse-with-new-set-of-data-tp16047.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: Spark LDA model reuse with new set of data

Posted by doruchiulan <do...@gmail.com>.
Yes, just saw myself the same thing last night. Thanks



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Spark-LDA-model-reuse-with-new-set-of-data-tp16099p16103.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org