You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Shashikant Kore <sh...@gmail.com> on 2010/04/29 13:44:47 UTC

Negative LLR Score

Root LLR calculation has a minor bug. When LLR score is negative,
square root is undefined. You can see the result for the following to
be NaN.

org.apache.mahout.math.stats.LogLikelihood.rootLogLikelihoodRatio(6,
7567, 1924, 2426487)

A minor fix would be to return zero if LLR is less than zero as follows.

public static double rootLogLikelihoodRatio(int k11, int k12, int k21,
int k22) {
 	double llr = logLikelihoodRatio(k11, k12, k21, k22);
        if (llr < 0)
             return 0;
 	return Math.signum(((double) k11 / (k11+k12)) - ((double) k21 /
(k21+k22))) * Math.sqrt(llr);
}

--shashi

Re: Negative LLR Score

Posted by Sean Owen <sr...@gmail.com>.
I could sure be wrong about this (or perhaps out of date). It makes
sense in theory. But I can't find it in the JLS and in the bytecode I
still see it calling Math.log(), calling StrictMath.log(), FWIW. I
would actually believe a JIT would do something with this. But I still
find myself always programming defensively so to speak.

On Thu, Apr 29, 2010 at 11:37 PM, Ted Dunning <te...@gmail.com> wrote:
>> javac definitely isn't allowed to do that kind of transformation -- it
>> actually can't do much of anything.
>>
>
> Javac and the JIT both know that log is a pure function.  They also
> definitely do constant sub-expression elimination and strength reduction.
>  Besides since logSum is a local variable there can be no side-effects.
>

Re: Negative LLR Score

Posted by Ted Dunning <te...@gmail.com>.
On Thu, Apr 29, 2010 at 3:29 PM, Sean Owen <sr...@gmail.com> wrote:

> You mean "sum * Math.log(sum)"?
>

Of course.  Sorry.


> javac definitely isn't allowed to do that kind of transformation -- it
> actually can't do much of anything.
>

Javac and the JIT both know that log is a pure function.  They also
definitely do constant sub-expression elimination and strength reduction.
 Besides since logSum is a local variable there can be no side-effects.

Re: Negative LLR Score

Posted by Sean Owen <sr...@gmail.com>.
You mean "sum * Math.log(sum)"?
That's nice, I'll go with that.

javac definitely isn't allowed to do that kind of transformation -- it
actually can't do much of anything.
ProGuard might -- it's actually a dynamite byte code optimizer and
I've been itching to get it re-integrated into the build for stuff
like this. The JIT probably wouldn't do this either as I doubt it can
prove no side effects due to the native method call.

So yeah it's still worthwhile to be prudent when programming for speed
in Java as not much is going to get done later for you.


On Thu, Apr 29, 2010 at 6:45 PM, Ted Dunning <te...@gmail.com> wrote:
> I think a cap is a good thing if the error is relatively small ( < 1e-4 or
> so).
>
> Betraying my age, I usually rewrite this as this:
>
>   for (int element : elements) {
>     if (element > 0) {
>       result += element * (Math.log(element));
>     }
>   }
>   result -= elements.size() * Math.log(sum)
>
> But presumably the compiler will notice this change and lift it out of the
> inner loop

Re: Negative LLR Score

Posted by Ted Dunning <te...@gmail.com>.
I think a cap is a good thing if the error is relatively small ( < 1e-4 or
so).

Betraying my age, I usually rewrite this as this:

   for (int element : elements) {
     if (element > 0) {
       result += element * (Math.log(element));
     }
   }
   result -= elements.size() * Math.log(sum)

But presumably the compiler will notice this change and lift it out of the
inner loop

On Thu, Apr 29, 2010 at 9:33 AM, Sean Owen <sr...@gmail.com> wrote:

>    double logSum = Math.log(sum);
>    for (int element : elements) {
>      if (element > 0) {
>        result += element * (Math.log(element) - logSum);
>      }
>    }
>

Re: Negative LLR Score

Posted by Sean Owen <sr...@gmail.com>.
FWIW I had rewritten the entropy() loop to be:


    for (int element : elements) {
      if (element > 0) {
        result += element * (Math.log(element / sum));
      }
    }

and then further to

    double logSum = Math.log(sum);
    for (int element : elements) {
      if (element > 0) {
        result += element * (Math.log(element) - logSum);
      }
    }

and I come up with at least a small positive value:
7.465509716331835E-5

Since it is not negative, somehow it strikes me as a change that makes
the result right-er (though mathmetically they ought to be the same).
It's a small optimization.

Shall I commit something like that, but also cap the LLR at 0 anyhow?
that fixes the original issue for sure.



On Thu, Apr 29, 2010 at 5:28 PM, Sean Owen <sr...@gmail.com> wrote:
> Ah yeah that's it.
>
> So... is the better change to cap the result of logLikelihoodRatio() at 0.0?
>
> On Thu, Apr 29, 2010 at 5:11 PM, Ted Dunning <te...@gmail.com> wrote:
>> I suspect round-off error.  In R I get this for the raw LLR:
>>
>>> llr(matrix(c(6,7567, 1924, 2426487), nrow=2))
>> [1] 3.380607e-11
>>
>> A slightly different implementation might well have gotten a small negative
>> number here.
>

Re: Negative LLR Score

Posted by Sean Owen <sr...@gmail.com>.
Ah yeah that's it.

So... is the better change to cap the result of logLikelihoodRatio() at 0.0?

On Thu, Apr 29, 2010 at 5:11 PM, Ted Dunning <te...@gmail.com> wrote:
> I suspect round-off error.  In R I get this for the raw LLR:
>
>> llr(matrix(c(6,7567, 1924, 2426487), nrow=2))
> [1] 3.380607e-11
>
> A slightly different implementation might well have gotten a small negative
> number here.

Re: Negative LLR Score

Posted by Ted Dunning <te...@gmail.com>.
I suspect round-off error.  In R I get this for the raw LLR:

> llr(matrix(c(6,7567, 1924, 2426487), nrow=2))
[1] 3.380607e-11

A slightly different implementation might well have gotten a small negative
number here.

On Thu, Apr 29, 2010 at 8:56 AM, Sean Owen <sr...@gmail.com> wrote:

> What about Shashikant's example? Unless my brain's not in gear, that
> seems like a legit example, but does indeed product a negative LLR.
>

Re: Negative LLR Score

Posted by Sean Owen <sr...@gmail.com>.
What about Shashikant's example? Unless my brain's not in gear, that
seems like a legit example, but does indeed product a negative LLR.

Re: Negative LLR Score

Posted by Ted Dunning <te...@gmail.com>.
Correct.

Mathematically speaking, if you have all positive counts then you can't get
a negative score except possibly by round-off error.

On Thu, Apr 29, 2010 at 6:10 AM, Drew Farris <dr...@gmail.com> wrote:

> I was under the (perhaps incorrect) impression that negative LLR indicates
> some sort of problem with the input.
>
> On Thu, Apr 29, 2010 at 7:44 AM, Shashikant Kore <shashikant@gmail.com
> >wrote:
>
> > Root LLR calculation has a minor bug. When LLR score is negative,
> > square root is undefined. You can see the result for the following to
> > be NaN.
> >
> > org.apache.mahout.math.stats.LogLikelihood.rootLogLikelihoodRatio(6,
> > 7567, 1924, 2426487)
> >
> > A minor fix would be to return zero if LLR is less than zero as follows.
> >
> > public static double rootLogLikelihoodRatio(int k11, int k12, int k21,
> > int k22) {
> >        double llr = logLikelihoodRatio(k11, k12, k21, k22);
> >        if (llr < 0)
> >             return 0;
> >        return Math.signum(((double) k11 / (k11+k12)) - ((double) k21 /
> > (k21+k22))) * Math.sqrt(llr);
> > }
> >
> > --shashi
> >
>

Re: Negative LLR Score

Posted by Drew Farris <dr...@gmail.com>.
I was under the (perhaps incorrect) impression that negative LLR indicates
some sort of problem with the input.

On Thu, Apr 29, 2010 at 7:44 AM, Shashikant Kore <sh...@gmail.com>wrote:

> Root LLR calculation has a minor bug. When LLR score is negative,
> square root is undefined. You can see the result for the following to
> be NaN.
>
> org.apache.mahout.math.stats.LogLikelihood.rootLogLikelihoodRatio(6,
> 7567, 1924, 2426487)
>
> A minor fix would be to return zero if LLR is less than zero as follows.
>
> public static double rootLogLikelihoodRatio(int k11, int k12, int k21,
> int k22) {
>        double llr = logLikelihoodRatio(k11, k12, k21, k22);
>        if (llr < 0)
>             return 0;
>        return Math.signum(((double) k11 / (k11+k12)) - ((double) k21 /
> (k21+k22))) * Math.sqrt(llr);
> }
>
> --shashi
>

Re: Negative LLR Score

Posted by Sean Owen <sr...@gmail.com>.
(I can easily make the fix and add a test, but is the right thing to
return 0, or instead proceed in the method with the value -sqrt(-llr)
when llr is negative?)

On Thu, Apr 29, 2010 at 12:44 PM, Shashikant Kore <sh...@gmail.com> wrote:
> Root LLR calculation has a minor bug. When LLR score is negative,
> square root is undefined. You can see the result for the following to
> be NaN.
>
> org.apache.mahout.math.stats.LogLikelihood.rootLogLikelihoodRatio(6,
> 7567, 1924, 2426487)
>
> A minor fix would be to return zero if LLR is less than zero as follows.
>
> public static double rootLogLikelihoodRatio(int k11, int k12, int k21,
> int k22) {
>        double llr = logLikelihoodRatio(k11, k12, k21, k22);
>        if (llr < 0)
>             return 0;
>        return Math.signum(((double) k11 / (k11+k12)) - ((double) k21 /
> (k21+k22))) * Math.sqrt(llr);
> }
>
> --shashi
>