You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Ted Dunning <te...@gmail.com> on 2014/05/01 00:02:34 UTC

Re: Understanding LogLikelihood Similarity

The contingency table is constructed by looking at how many users have
expressed preference or interest in two items.  If the items are A and B,
the pertinent counts are

k11 - the number of users who interacted with both A and B
k12 - the number of users who interacted with A but not B
k21 - the number of users who interacted with B but not A
k22 - the number of users who interacted with neither A nor B.

These values are values that go into the contingency table and are all that
is needed to compute the LLR value.

See http://tdunning.blogspot.de/2008/03/surprise-and-coincidence.html for a
detailed description.




On Wed, Apr 30, 2014 at 11:31 PM, Mario Levitin <ma...@gmail.com>wrote:

> Hi Ted,
> I have read the paper. I understand the "Likelihood Ratio for Binomial
> Distributions" part.
> However, I cannot make a connection with this part and the contingency
> table.
>
> In order to calculate Likelihood Ratio for two Binomial Distributions you
> need the values: p, p1, p2, k1, k2, n1, n2.
> But the information contained in the contingency table are different from
> these values. So, again, I do not understand how the information contained
> in the contingency table is linked with Likelihood Ratio for Binomial
> Distributions.
>
> In order to find the similarity between two users I tend to think of the
> boolean preferences of user1 as a sample from a binomial distribution and
> the boolean preferences of user2 as another sample from a binomial
> distribution. Then use the LLR to assess how likely these distributions are
> the same. But I don't think this is correct since this calculation does not
> use the contingency table.
>
> I hope my question is clear.
> Thanks.
>
>
>
> On Mon, Apr 28, 2014 at 2:41 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Excellent.  Look forward to hearing your reactions.
> >
> > On Mon, Apr 28, 2014 at 1:14 AM, Mario Levitin <mariolevitin@gmail.com
> > >wrote:
> >
> > > Not yet, but I will.
> > >
> > > >
> > > > Have you read my original paper on the topic of LLR?  It explains the
> > > > connection with chi^2 measures of association.
> > >
> >
>

Re: Understanding LogLikelihood Similarity

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
sorry, clicked on wrong thread. please disregard.


On Wed, Apr 30, 2014 at 4:24 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> sure. I assume this should include statements that something crushes
> something without providing a link to a published analysis of what it is
> something that crushes something another and due to what something.
>
>
> On Wed, Apr 30, 2014 at 4:21 PM, Ted Dunning <te...@gmail.com>wrote:
>
>> OK.
>>
>> Whether a user has interacted with A is a sample from a binomial
>> distribution with an unknown parameter p_A.  Likewise with B and p_B.  The
>> two binomial distributions may or may not be independent.
>>
>> The LLR is measuring the degree evidence against independence.
>>
>>
>>
>>
>> On Thu, May 1, 2014 at 12:50 AM, Mario Levitin <mariolevitin@gmail.com
>> >wrote:
>>
>> > Ted, I understand how the contingency table is constructed, and how to
>> > compute the LLR value. What I cannot understand is how to link this with
>> > binomial distributions.
>> >
>> >
>> > On Thu, May 1, 2014 at 1:02 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>> >
>> > > The contingency table is constructed by looking at how many users have
>> > > expressed preference or interest in two items.  If the items are A
>> and B,
>> > > the pertinent counts are
>> > >
>> > > k11 - the number of users who interacted with both A and B
>> > > k12 - the number of users who interacted with A but not B
>> > > k21 - the number of users who interacted with B but not A
>> > > k22 - the number of users who interacted with neither A nor B.
>> > >
>> > > These values are values that go into the contingency table and are all
>> > that
>> > > is needed to compute the LLR value.
>> > >
>> > > See
>> http://tdunning.blogspot.de/2008/03/surprise-and-coincidence.htmlfor
>> > > a
>> > > detailed description.
>> > >
>> > >
>> > >
>> > >
>> > > On Wed, Apr 30, 2014 at 11:31 PM, Mario Levitin <
>> mariolevitin@gmail.com
>> > > >wrote:
>> > >
>> > > > Hi Ted,
>> > > > I have read the paper. I understand the "Likelihood Ratio for
>> Binomial
>> > > > Distributions" part.
>> > > > However, I cannot make a connection with this part and the
>> contingency
>> > > > table.
>> > > >
>> > > > In order to calculate Likelihood Ratio for two Binomial
>> Distributions
>> > you
>> > > > need the values: p, p1, p2, k1, k2, n1, n2.
>> > > > But the information contained in the contingency table are different
>> > from
>> > > > these values. So, again, I do not understand how the information
>> > > contained
>> > > > in the contingency table is linked with Likelihood Ratio for
>> Binomial
>> > > > Distributions.
>> > > >
>> > > > In order to find the similarity between two users I tend to think of
>> > the
>> > > > boolean preferences of user1 as a sample from a binomial
>> distribution
>> > and
>> > > > the boolean preferences of user2 as another sample from a binomial
>> > > > distribution. Then use the LLR to assess how likely these
>> distributions
>> > > are
>> > > > the same. But I don't think this is correct since this calculation
>> does
>> > > not
>> > > > use the contingency table.
>> > > >
>> > > > I hope my question is clear.
>> > > > Thanks.
>> > > >
>> > > >
>> > > >
>> > > > On Mon, Apr 28, 2014 at 2:41 AM, Ted Dunning <ted.dunning@gmail.com
>> >
>> > > > wrote:
>> > > >
>> > > > > Excellent.  Look forward to hearing your reactions.
>> > > > >
>> > > > > On Mon, Apr 28, 2014 at 1:14 AM, Mario Levitin <
>> > mariolevitin@gmail.com
>> > > > > >wrote:
>> > > > >
>> > > > > > Not yet, but I will.
>> > > > > >
>> > > > > > >
>> > > > > > > Have you read my original paper on the topic of LLR?  It
>> explains
>> > > the
>> > > > > > > connection with chi^2 measures of association.
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Understanding LogLikelihood Similarity

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
sure. I assume this should include statements that something crushes
something without providing a link to a published analysis of what it is
something that crushes something another and due to what something.


On Wed, Apr 30, 2014 at 4:21 PM, Ted Dunning <te...@gmail.com> wrote:

> OK.
>
> Whether a user has interacted with A is a sample from a binomial
> distribution with an unknown parameter p_A.  Likewise with B and p_B.  The
> two binomial distributions may or may not be independent.
>
> The LLR is measuring the degree evidence against independence.
>
>
>
>
> On Thu, May 1, 2014 at 12:50 AM, Mario Levitin <mariolevitin@gmail.com
> >wrote:
>
> > Ted, I understand how the contingency table is constructed, and how to
> > compute the LLR value. What I cannot understand is how to link this with
> > binomial distributions.
> >
> >
> > On Thu, May 1, 2014 at 1:02 AM, Ted Dunning <te...@gmail.com>
> wrote:
> >
> > > The contingency table is constructed by looking at how many users have
> > > expressed preference or interest in two items.  If the items are A and
> B,
> > > the pertinent counts are
> > >
> > > k11 - the number of users who interacted with both A and B
> > > k12 - the number of users who interacted with A but not B
> > > k21 - the number of users who interacted with B but not A
> > > k22 - the number of users who interacted with neither A nor B.
> > >
> > > These values are values that go into the contingency table and are all
> > that
> > > is needed to compute the LLR value.
> > >
> > > See
> http://tdunning.blogspot.de/2008/03/surprise-and-coincidence.htmlfor
> > > a
> > > detailed description.
> > >
> > >
> > >
> > >
> > > On Wed, Apr 30, 2014 at 11:31 PM, Mario Levitin <
> mariolevitin@gmail.com
> > > >wrote:
> > >
> > > > Hi Ted,
> > > > I have read the paper. I understand the "Likelihood Ratio for
> Binomial
> > > > Distributions" part.
> > > > However, I cannot make a connection with this part and the
> contingency
> > > > table.
> > > >
> > > > In order to calculate Likelihood Ratio for two Binomial Distributions
> > you
> > > > need the values: p, p1, p2, k1, k2, n1, n2.
> > > > But the information contained in the contingency table are different
> > from
> > > > these values. So, again, I do not understand how the information
> > > contained
> > > > in the contingency table is linked with Likelihood Ratio for Binomial
> > > > Distributions.
> > > >
> > > > In order to find the similarity between two users I tend to think of
> > the
> > > > boolean preferences of user1 as a sample from a binomial distribution
> > and
> > > > the boolean preferences of user2 as another sample from a binomial
> > > > distribution. Then use the LLR to assess how likely these
> distributions
> > > are
> > > > the same. But I don't think this is correct since this calculation
> does
> > > not
> > > > use the contingency table.
> > > >
> > > > I hope my question is clear.
> > > > Thanks.
> > > >
> > > >
> > > >
> > > > On Mon, Apr 28, 2014 at 2:41 AM, Ted Dunning <te...@gmail.com>
> > > > wrote:
> > > >
> > > > > Excellent.  Look forward to hearing your reactions.
> > > > >
> > > > > On Mon, Apr 28, 2014 at 1:14 AM, Mario Levitin <
> > mariolevitin@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Not yet, but I will.
> > > > > >
> > > > > > >
> > > > > > > Have you read my original paper on the topic of LLR?  It
> explains
> > > the
> > > > > > > connection with chi^2 measures of association.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Understanding LogLikelihood Similarity

Posted by Mario Levitin <ma...@gmail.com>.
After some thought now I'm in a better situation, but I think it will take
some more effort to fully digest it.
Thanks for your answers Ted.



On Thu, May 1, 2014 at 2:21 AM, Ted Dunning <te...@gmail.com> wrote:

> OK.
>
> Whether a user has interacted with A is a sample from a binomial
> distribution with an unknown parameter p_A.  Likewise with B and p_B.  The
> two binomial distributions may or may not be independent.
>
> The LLR is measuring the degree evidence against independence.
>
>
>
>
> On Thu, May 1, 2014 at 12:50 AM, Mario Levitin <mariolevitin@gmail.com
> >wrote:
>
> > Ted, I understand how the contingency table is constructed, and how to
> > compute the LLR value. What I cannot understand is how to link this with
> > binomial distributions.
> >
> >
> > On Thu, May 1, 2014 at 1:02 AM, Ted Dunning <te...@gmail.com>
> wrote:
> >
> > > The contingency table is constructed by looking at how many users have
> > > expressed preference or interest in two items.  If the items are A and
> B,
> > > the pertinent counts are
> > >
> > > k11 - the number of users who interacted with both A and B
> > > k12 - the number of users who interacted with A but not B
> > > k21 - the number of users who interacted with B but not A
> > > k22 - the number of users who interacted with neither A nor B.
> > >
> > > These values are values that go into the contingency table and are all
> > that
> > > is needed to compute the LLR value.
> > >
> > > See
> http://tdunning.blogspot.de/2008/03/surprise-and-coincidence.htmlfor
> > > a
> > > detailed description.
> > >
> > >
> > >
> > >
> > > On Wed, Apr 30, 2014 at 11:31 PM, Mario Levitin <
> mariolevitin@gmail.com
> > > >wrote:
> > >
> > > > Hi Ted,
> > > > I have read the paper. I understand the "Likelihood Ratio for
> Binomial
> > > > Distributions" part.
> > > > However, I cannot make a connection with this part and the
> contingency
> > > > table.
> > > >
> > > > In order to calculate Likelihood Ratio for two Binomial Distributions
> > you
> > > > need the values: p, p1, p2, k1, k2, n1, n2.
> > > > But the information contained in the contingency table are different
> > from
> > > > these values. So, again, I do not understand how the information
> > > contained
> > > > in the contingency table is linked with Likelihood Ratio for Binomial
> > > > Distributions.
> > > >
> > > > In order to find the similarity between two users I tend to think of
> > the
> > > > boolean preferences of user1 as a sample from a binomial distribution
> > and
> > > > the boolean preferences of user2 as another sample from a binomial
> > > > distribution. Then use the LLR to assess how likely these
> distributions
> > > are
> > > > the same. But I don't think this is correct since this calculation
> does
> > > not
> > > > use the contingency table.
> > > >
> > > > I hope my question is clear.
> > > > Thanks.
> > > >
> > > >
> > > >
> > > > On Mon, Apr 28, 2014 at 2:41 AM, Ted Dunning <te...@gmail.com>
> > > > wrote:
> > > >
> > > > > Excellent.  Look forward to hearing your reactions.
> > > > >
> > > > > On Mon, Apr 28, 2014 at 1:14 AM, Mario Levitin <
> > mariolevitin@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Not yet, but I will.
> > > > > >
> > > > > > >
> > > > > > > Have you read my original paper on the topic of LLR?  It
> explains
> > > the
> > > > > > > connection with chi^2 measures of association.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Understanding LogLikelihood Similarity

Posted by Ted Dunning <te...@gmail.com>.
OK.

Whether a user has interacted with A is a sample from a binomial
distribution with an unknown parameter p_A.  Likewise with B and p_B.  The
two binomial distributions may or may not be independent.

The LLR is measuring the degree evidence against independence.




On Thu, May 1, 2014 at 12:50 AM, Mario Levitin <ma...@gmail.com>wrote:

> Ted, I understand how the contingency table is constructed, and how to
> compute the LLR value. What I cannot understand is how to link this with
> binomial distributions.
>
>
> On Thu, May 1, 2014 at 1:02 AM, Ted Dunning <te...@gmail.com> wrote:
>
> > The contingency table is constructed by looking at how many users have
> > expressed preference or interest in two items.  If the items are A and B,
> > the pertinent counts are
> >
> > k11 - the number of users who interacted with both A and B
> > k12 - the number of users who interacted with A but not B
> > k21 - the number of users who interacted with B but not A
> > k22 - the number of users who interacted with neither A nor B.
> >
> > These values are values that go into the contingency table and are all
> that
> > is needed to compute the LLR value.
> >
> > See http://tdunning.blogspot.de/2008/03/surprise-and-coincidence.htmlfor
> > a
> > detailed description.
> >
> >
> >
> >
> > On Wed, Apr 30, 2014 at 11:31 PM, Mario Levitin <mariolevitin@gmail.com
> > >wrote:
> >
> > > Hi Ted,
> > > I have read the paper. I understand the "Likelihood Ratio for Binomial
> > > Distributions" part.
> > > However, I cannot make a connection with this part and the contingency
> > > table.
> > >
> > > In order to calculate Likelihood Ratio for two Binomial Distributions
> you
> > > need the values: p, p1, p2, k1, k2, n1, n2.
> > > But the information contained in the contingency table are different
> from
> > > these values. So, again, I do not understand how the information
> > contained
> > > in the contingency table is linked with Likelihood Ratio for Binomial
> > > Distributions.
> > >
> > > In order to find the similarity between two users I tend to think of
> the
> > > boolean preferences of user1 as a sample from a binomial distribution
> and
> > > the boolean preferences of user2 as another sample from a binomial
> > > distribution. Then use the LLR to assess how likely these distributions
> > are
> > > the same. But I don't think this is correct since this calculation does
> > not
> > > use the contingency table.
> > >
> > > I hope my question is clear.
> > > Thanks.
> > >
> > >
> > >
> > > On Mon, Apr 28, 2014 at 2:41 AM, Ted Dunning <te...@gmail.com>
> > > wrote:
> > >
> > > > Excellent.  Look forward to hearing your reactions.
> > > >
> > > > On Mon, Apr 28, 2014 at 1:14 AM, Mario Levitin <
> mariolevitin@gmail.com
> > > > >wrote:
> > > >
> > > > > Not yet, but I will.
> > > > >
> > > > > >
> > > > > > Have you read my original paper on the topic of LLR?  It explains
> > the
> > > > > > connection with chi^2 measures of association.
> > > > >
> > > >
> > >
> >
>

Re: Understanding LogLikelihood Similarity

Posted by Mario Levitin <ma...@gmail.com>.
Ted, I understand how the contingency table is constructed, and how to
compute the LLR value. What I cannot understand is how to link this with
binomial distributions.


On Thu, May 1, 2014 at 1:02 AM, Ted Dunning <te...@gmail.com> wrote:

> The contingency table is constructed by looking at how many users have
> expressed preference or interest in two items.  If the items are A and B,
> the pertinent counts are
>
> k11 - the number of users who interacted with both A and B
> k12 - the number of users who interacted with A but not B
> k21 - the number of users who interacted with B but not A
> k22 - the number of users who interacted with neither A nor B.
>
> These values are values that go into the contingency table and are all that
> is needed to compute the LLR value.
>
> See http://tdunning.blogspot.de/2008/03/surprise-and-coincidence.html for
> a
> detailed description.
>
>
>
>
> On Wed, Apr 30, 2014 at 11:31 PM, Mario Levitin <mariolevitin@gmail.com
> >wrote:
>
> > Hi Ted,
> > I have read the paper. I understand the "Likelihood Ratio for Binomial
> > Distributions" part.
> > However, I cannot make a connection with this part and the contingency
> > table.
> >
> > In order to calculate Likelihood Ratio for two Binomial Distributions you
> > need the values: p, p1, p2, k1, k2, n1, n2.
> > But the information contained in the contingency table are different from
> > these values. So, again, I do not understand how the information
> contained
> > in the contingency table is linked with Likelihood Ratio for Binomial
> > Distributions.
> >
> > In order to find the similarity between two users I tend to think of the
> > boolean preferences of user1 as a sample from a binomial distribution and
> > the boolean preferences of user2 as another sample from a binomial
> > distribution. Then use the LLR to assess how likely these distributions
> are
> > the same. But I don't think this is correct since this calculation does
> not
> > use the contingency table.
> >
> > I hope my question is clear.
> > Thanks.
> >
> >
> >
> > On Mon, Apr 28, 2014 at 2:41 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > Excellent.  Look forward to hearing your reactions.
> > >
> > > On Mon, Apr 28, 2014 at 1:14 AM, Mario Levitin <mariolevitin@gmail.com
> > > >wrote:
> > >
> > > > Not yet, but I will.
> > > >
> > > > >
> > > > > Have you read my original paper on the topic of LLR?  It explains
> the
> > > > > connection with chi^2 measures of association.
> > > >
> > >
> >
>