You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by vineeth <vi...@gmail.com> on 2012/09/22 19:00:56 UTC

running lda on test dataset

Hello,

I am searching for how to run mahout LDA on test data set to detect the 
topics. Is there a way to test the trained lda model? or should we write 
our own program based on the word-topic probabilities that the LDA spits 
out after running on the test data?

Thanks
Vineeth

Re: running lda on test dataset

Posted by Jake Mannix <ja...@gmail.com>.

On Sat, Sep 22, 2012 at 12:49 PM, chyi-kwei yau <ch...@gmail.com>wrote:

> Hi,
> You should be able to run inference on a test data set.
> And use perplexity of the test set to measure the performance of your
> model.
>
> Check the LDA paper here and see the detail:
> http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf

The current LDA implementation in Mahout has a command-line option:

  --test_set_percentage

to hold out some of your training data as a "test set" which is used to
measure
held-out perplexity during training.  The command-line option:

 --iteration_block_size

sets the training to compute held-out perplexity after this many iterations
(so
if you set this to 10 then held-out perplexity is only computed ever 10
iterations over the input data).

The perplexity is logged to the console during training, and is also
persisted
in sequence files parallel with the model files (in a directory like
$OUTPUT_DIR/perplexity-$ITERATION_NUMBER or something like that).

So this will tell you how well converged you are, and how likely your test
data would be to have been generated by your model, if that is a test
you'd find useful.

>
>
> Best,
> Chyi-Kwei
>
> On Sat, Sep 22, 2012 at 2:51 PM, Jake Mannix <ja...@gmail.com>
> wrote:
> > What would you want a test to tell you?  LDA is unsupervised, so it'll
> give
> > you the word-topic probabilities, and for each test document (or training
> > document) you can get the document-topic probabilities as well.  Then...
> > what would you like to know at that point?
> >
> > On Sat, Sep 22, 2012 at 10:00 AM, vineeth <vi...@gmail.com>
> wrote:
> >
> >> Hello,
> >>
> >> I am searching for how to run mahout LDA on test data set to detect the
> >> topics. Is there a way to test the trained lda model? or should we write
> >> our own program based on the word-topic probabilities that the LDA spits
> >> out after running on the test data?
> >>
> >> Thanks
> >> Vineeth
> >>
> >
> >
> >
> > --
> >
> >   -jake
>

-- 

  -jake

Re: running lda on test dataset

Posted by chyi-kwei yau <ch...@gmail.com>.

Hi,
You should be able to run inference on a test data set.
And use perplexity of the test set to measure the performance of your model.

Check the LDA paper here and see the detail:
http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf

Best,
Chyi-Kwei

On Sat, Sep 22, 2012 at 2:51 PM, Jake Mannix <ja...@gmail.com> wrote:
> What would you want a test to tell you?  LDA is unsupervised, so it'll give
> you the word-topic probabilities, and for each test document (or training
> document) you can get the document-topic probabilities as well.  Then...
> what would you like to know at that point?
>
> On Sat, Sep 22, 2012 at 10:00 AM, vineeth <vi...@gmail.com> wrote:
>
>> Hello,
>>
>> I am searching for how to run mahout LDA on test data set to detect the
>> topics. Is there a way to test the trained lda model? or should we write
>> our own program based on the word-topic probabilities that the LDA spits
>> out after running on the test data?
>>
>> Thanks
>> Vineeth
>>
>
>
>
> --
>
>   -jake

Re: running lda on test dataset

Posted by Jake Mannix <ja...@gmail.com>.

What would you want a test to tell you?  LDA is unsupervised, so it'll give
you the word-topic probabilities, and for each test document (or training
document) you can get the document-topic probabilities as well.  Then...
what would you like to know at that point?

On Sat, Sep 22, 2012 at 10:00 AM, vineeth <vi...@gmail.com> wrote:

> Hello,
>
> I am searching for how to run mahout LDA on test data set to detect the
> topics. Is there a way to test the trained lda model? or should we write
> our own program based on the word-topic probabilities that the LDA spits
> out after running on the test data?
>
> Thanks
> Vineeth
>

-- 

  -jake