You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Stephen Boesch <ja...@gmail.com> on 2017/12/07 08:15:59 UTC

Re: LDA and evaluating topic number

I have been testing on the 20 NewsGroups dataset - which the Spark docs
themselves reference.  I can confirm that perplexity increases and
likelihood decreases as topics increase - and am similarly confused by
these results.

2017-09-28 10:50 GMT-07:00 Cody Buntain <cb...@cs.umd.edu>:

> Hi, all!
>
> Is there an example somewhere on using LDA’s logPerplexity()/logLikelihood()
> functions to evaluate topic counts? The existing MLLib LDA examples show
> calling them, but I can’t find any documentation about how to interpret the
> outputs. Graphing the outputs for logs of perplexity and likelihood aren’t
> consistent with what I expected (perplexity increases and likelihood
> decreases as topics increase, which seem odd to me).
>
> An example of what I’m doing is here: http://www.cs.umd.edu/~
> cbuntain/FindTopicK-pyspark-regex.html
>
> Thanks very much in advance! If I can figure this out, I can post example
> code online, so others can see how this process is done.
>
> -Best regards,
> Cody
> _________________
> Cody Buntain, PhD
> Postdoc, @UMD_CS
> Intelligence Community Postdoctoral Fellow
> cbuntain@cs.umd.edu
> www.cs.umd.edu/~cbuntain
>
>