You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by "Quiroz Hernandez, Andres" <An...@xerox.com> on 2010/12/06 17:02:23 UTC

Probability from log likelihood in LDA output

Hello,

As I understand it, the output for LDA is a log likelihood value for
each word/topic pair, which is a function of log(P(w|t)). Is it possible
to invert that function to obtain P(w|t)? I have a feeling it is not,
since it looks like the final value is obtained as a sum of log
probabilities, but I just wanted to check, since an output as a
probability is more readable than the likelihood value given.

Thanks,

Andres

Re: Probability from log likelihood in LDA output

Posted by David Hall <dl...@cs.berkeley.edu>.

Hi,

The scores aren't (log) normalized until they're loaded in the map
phase. Take a look at LDAState. The array

private final double[] logTotals; // log \sum p(w|t) for topic=1..nTopics

in LDAState has normalization constants.  The method
logProbWordGivenTopic is intended for access...  LDADriver#createState
is a round about way of creating an LDA State.

-- David

On Mon, Dec 6, 2010 at 12:06 PM, Quiroz Hernandez, Andres
<An...@xerox.com> wrote:
> Thanks for your quick reply, Ted. It looks like either the probabilities are not normalized or the function being used is not a simple sum of log probabilities, because exp does not always return a value between 0 and 1. I will take a look at the code to see if I can find exactly how the value is calculated (but if anyone knows the function used, and if I can directly invert it to find P(w|t) please let me know).
>
> Thanks again,
>
> Andres
>
> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Monday, December 06, 2010 11:57 AM
> To: user@mahout.apache.org
> Subject: Re: Probability from log likelihood in LDA output
>
> Yes.  I should be possible to use exp to get the actual probability.  The
> fact that it is a sum
> of log probabilities just means that the probability is a product of
> probabilities.
>
> It is possible that the probabilities are not normalized, but that would be
> a bit surprising for
> this kind of algorithm.
>
> On Mon, Dec 6, 2010 at 8:02 AM, Quiroz Hernandez, Andres <
> Andres.QuirozHernandez@xerox.com> wrote:
>
>> Hello,
>>
>> As I understand it, the output for LDA is a log likelihood value for
>> each word/topic pair, which is a function of log(P(w|t)). Is it possible
>> to invert that function to obtain P(w|t)? I have a feeling it is not,
>> since it looks like the final value is obtained as a sum of log
>> probabilities, but I just wanted to check, since an output as a
>> probability is more readable than the likelihood value given.
>>
>> Thanks,
>>
>> Andres
>>
>

RE: Probability from log likelihood in LDA output

Posted by "Quiroz Hernandez, Andres" <An...@xerox.com>.

Thanks for your quick reply, Ted. It looks like either the probabilities are not normalized or the function being used is not a simple sum of log probabilities, because exp does not always return a value between 0 and 1. I will take a look at the code to see if I can find exactly how the value is calculated (but if anyone knows the function used, and if I can directly invert it to find P(w|t) please let me know).

Thanks again,

Andres

-----Original Message-----
From: Ted Dunning [mailto:ted.dunning@gmail.com] 
Sent: Monday, December 06, 2010 11:57 AM
To: user@mahout.apache.org
Subject: Re: Probability from log likelihood in LDA output

Yes.  I should be possible to use exp to get the actual probability.  The
fact that it is a sum
of log probabilities just means that the probability is a product of
probabilities.

It is possible that the probabilities are not normalized, but that would be
a bit surprising for
this kind of algorithm.

On Mon, Dec 6, 2010 at 8:02 AM, Quiroz Hernandez, Andres <
Andres.QuirozHernandez@xerox.com> wrote:

> Hello,
>
> As I understand it, the output for LDA is a log likelihood value for
> each word/topic pair, which is a function of log(P(w|t)). Is it possible
> to invert that function to obtain P(w|t)? I have a feeling it is not,
> since it looks like the final value is obtained as a sum of log
> probabilities, but I just wanted to check, since an output as a
> probability is more readable than the likelihood value given.
>
> Thanks,
>
> Andres
>

Re: Probability from log likelihood in LDA output

Posted by Ted Dunning <te...@gmail.com>.

Yes.  I should be possible to use exp to get the actual probability.  The
fact that it is a sum
of log probabilities just means that the probability is a product of
probabilities.

It is possible that the probabilities are not normalized, but that would be
a bit surprising for
this kind of algorithm.

On Mon, Dec 6, 2010 at 8:02 AM, Quiroz Hernandez, Andres <
Andres.QuirozHernandez@xerox.com> wrote:

> Hello,
>
> As I understand it, the output for LDA is a log likelihood value for
> each word/topic pair, which is a function of log(P(w|t)). Is it possible
> to invert that function to obtain P(w|t)? I have a feeling it is not,
> since it looks like the final value is obtained as a sum of log
> probabilities, but I just wanted to check, since an output as a
> probability is more readable than the likelihood value given.
>
> Thanks,
>
> Andres
>