You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by jamborta <ja...@gmail.com> on 2009/11/26 16:14:52 UTC

Mahout/Taste covariance between two items

hi guys,
just wondering if you have a method implemeted which would calculate the
covariance between two items. and the variance of an item. I looked
itemSimilarities but that one does something different.

thanks
Tama 
-- 
View this message in context: http://old.nabble.com/Mahout-Taste-covariance-between-two-items-tp26530825p26530825.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Mahout/Taste covariance between two items

Posted by jamborta <ja...@gmail.com>.

i really just want to get the sample covariance which is:

sum(X_i - meanX)(Y_i - meanY)/N-1

this is just

 pearson_x,y * sdX * sdY

i think sumXY/N-1 should be the right one.


srowen wrote:
> 
> I'm not so familiar with this formula but you seem to be missing
> something in the denominator... it's got to normalize somehow. I think
> I said divide by standard deviation but that's not quite it. What you
> are really summing are the products of z-scores.  See
> http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
> 
> But I think you should just use the formulation given in the code?
> should be the same result. At least I hope these aren't different
> definitions of Pearson!
> 
> On Fri, Nov 27, 2009 at 10:20 AM, jamborta <ja...@gmail.com> wrote:
>>
>> thanks you. much clearer now.
>>
>> so for my purpose this will do:
>>
>> sumXY/N-1
>>
>> given that the data is 'centered'?
>>
>>
>> On Fri, Nov 27, 2009 at 1:41 AM, jamborta <ja...@gmail.com> wrote:
>>>
>>> hi. I tried to figure out how you calcualte pearson correlation, but it
>>> looks
>>> like you use this formula:
>>>
>>> sumXY / sqrt(sumX2 * sumY2)
>>
>> Yes that's right -- this is what Pearson reduces to when the mean of X
>> and Y are 0. And they are here -- the implementation 'centers' the
>> data.
>>
>>> where sumXY = sumXY - meanY * sumX;
>>> sumX2 = sumX2 - meanX * sumX;
>>> sumY2 = sumY2 - meanY * sumY;
>>
>> You see the lines commented out there? Those are the full forms of the
>> expressions, which may make more sense. This is centering the data,
>> making the mean 0.
>>
>> This is a simplification based on the observation that, for example,
>> sumX * meanY = sumY * meanX = n * meanY * meanX.
>>
>>>
>>> i don't really understand how you got these equations. could you explain
>>> it
>>> to me? I thought pearson correlation would be like this
>>>
>>> E(x_i-meanX)(y_i-meanY) / sdX*sdY
>>
>> That's right that's the expression for a population correlation, but
>> we can really only compute a sample Pearson correlation coefficient,
>> yes:
>>
>>
>>> for my project I would need to get sample correlation coefficient which
>>> would be something like this:
>>>
>>> sum(x_i-meanX)(y_i-meanY)/(N-1)
>>
>> Yeah that's fine too, this is another way of expressing the formula,
>> though you're missing the two standard deviations in the denominator.
>> It'll be clearer if I note that the mean of X and Y are 0.
>>
>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Mahout-Taste-covariance-between-two-items-tp26530825p26540395.html
>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/Mahout-Taste-covariance-between-two-items-tp26530825p26541591.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Mahout/Taste covariance between two items

Posted by Sean Owen <sr...@gmail.com>.

I'm not so familiar with this formula but you seem to be missing
something in the denominator... it's got to normalize somehow. I think
I said divide by standard deviation but that's not quite it. What you
are really summing are the products of z-scores.  See
http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient

But I think you should just use the formulation given in the code?
should be the same result. At least I hope these aren't different
definitions of Pearson!

On Fri, Nov 27, 2009 at 10:20 AM, jamborta <ja...@gmail.com> wrote:
>
> thanks you. much clearer now.
>
> so for my purpose this will do:
>
> sumXY/N-1
>
> given that the data is 'centered'?
>
>
> On Fri, Nov 27, 2009 at 1:41 AM, jamborta <ja...@gmail.com> wrote:
>>
>> hi. I tried to figure out how you calcualte pearson correlation, but it
>> looks
>> like you use this formula:
>>
>> sumXY / sqrt(sumX2 * sumY2)
>
> Yes that's right -- this is what Pearson reduces to when the mean of X
> and Y are 0. And they are here -- the implementation 'centers' the
> data.
>
>> where sumXY = sumXY - meanY * sumX;
>> sumX2 = sumX2 - meanX * sumX;
>> sumY2 = sumY2 - meanY * sumY;
>
> You see the lines commented out there? Those are the full forms of the
> expressions, which may make more sense. This is centering the data,
> making the mean 0.
>
> This is a simplification based on the observation that, for example,
> sumX * meanY = sumY * meanX = n * meanY * meanX.
>
>>
>> i don't really understand how you got these equations. could you explain
>> it
>> to me? I thought pearson correlation would be like this
>>
>> E(x_i-meanX)(y_i-meanY) / sdX*sdY
>
> That's right that's the expression for a population correlation, but
> we can really only compute a sample Pearson correlation coefficient,
> yes:
>
>
>> for my project I would need to get sample correlation coefficient which
>> would be something like this:
>>
>> sum(x_i-meanX)(y_i-meanY)/(N-1)
>
> Yeah that's fine too, this is another way of expressing the formula,
> though you're missing the two standard deviations in the denominator.
> It'll be clearer if I note that the mean of X and Y are 0.
>
>
>
> --
> View this message in context: http://old.nabble.com/Mahout-Taste-covariance-between-two-items-tp26530825p26540395.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>
>

Re: Mahout/Taste covariance between two items

Posted by jamborta <ja...@gmail.com>.

thanks you. much clearer now.

so for my purpose this will do:

sumXY/N-1

given that the data is 'centered'?

On Fri, Nov 27, 2009 at 1:41 AM, jamborta <ja...@gmail.com> wrote:
>
> hi. I tried to figure out how you calcualte pearson correlation, but it
> looks
> like you use this formula:
>
> sumXY / sqrt(sumX2 * sumY2)

Yes that's right -- this is what Pearson reduces to when the mean of X
and Y are 0. And they are here -- the implementation 'centers' the
data.

> where sumXY = sumXY - meanY * sumX;
> sumX2 = sumX2 - meanX * sumX;
> sumY2 = sumY2 - meanY * sumY;

You see the lines commented out there? Those are the full forms of the
expressions, which may make more sense. This is centering the data,
making the mean 0.

This is a simplification based on the observation that, for example,
sumX * meanY = sumY * meanX = n * meanY * meanX.

>
> i don't really understand how you got these equations. could you explain
> it
> to me? I thought pearson correlation would be like this
>
> E(x_i-meanX)(y_i-meanY) / sdX*sdY

That's right that's the expression for a population correlation, but
we can really only compute a sample Pearson correlation coefficient,
yes:

> for my project I would need to get sample correlation coefficient which
> would be something like this:
>
> sum(x_i-meanX)(y_i-meanY)/(N-1)

Yeah that's fine too, this is another way of expressing the formula,
though you're missing the two standard deviations in the denominator.
It'll be clearer if I note that the mean of X and Y are 0.

-- 
View this message in context: http://old.nabble.com/Mahout-Taste-covariance-between-two-items-tp26530825p26540395.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Mahout/Taste covariance between two items

Posted by Sean Owen <sr...@gmail.com>.

On Fri, Nov 27, 2009 at 1:41 AM, jamborta <ja...@gmail.com> wrote:
>
> hi. I tried to figure out how you calcualte pearson correlation, but it looks
> like you use this formula:
>
> sumXY / sqrt(sumX2 * sumY2)

Yes that's right -- this is what Pearson reduces to when the mean of X
and Y are 0. And they are here -- the implementation 'centers' the
data.

> where sumXY = sumXY - meanY * sumX;
> sumX2 = sumX2 - meanX * sumX;
> sumY2 = sumY2 - meanY * sumY;

You see the lines commented out there? Those are the full forms of the
expressions, which may make more sense. This is centering the data,
making the mean 0.

This is a simplification based on the observation that, for example,
sumX * meanY = sumY * meanX = n * meanY * meanX.

>
> i don't really understand how you got these equations. could you explain it
> to me? I thought pearson correlation would be like this
>
> E(x_i-meanX)(y_i-meanY) / sdX*sdY

That's right that's the expression for a population correlation, but
we can really only compute a sample Pearson correlation coefficient,
yes:

> for my project I would need to get sample correlation coefficient which
> would be something like this:
>
> sum(x_i-meanX)(y_i-meanY)/(N-1)

Yeah that's fine too, this is another way of expressing the formula,
though you're missing the two standard deviations in the denominator.
It'll be clearer if I note that the mean of X and Y are 0.

Re: Mahout/Taste covariance between two items

Posted by jamborta <ja...@gmail.com>.

hi. I tried to figure out how you calcualte pearson correlation, but it looks
like you use this formula:

sumXY / sqrt(sumX2 * sumY2)

where sumXY = sumXY - meanY * sumX;
sumX2 = sumX2 - meanX * sumX;
sumY2 = sumY2 - meanY * sumY;

i don't really understand how you got these equations. could you explain it
to me? I thought pearson correlation would be like this

E(x_i-meanX)(y_i-meanY) / sdX*sdY

for my project I would need to get sample correlation coefficient which
would be something like this:

sum(x_i-meanX)(y_i-meanY)/(N-1)

thanks a lot. 


srowen wrote:
> 
> Yes. Look at PearsonCorrelationSimilarity. It implements
> ItemSimilarity so it can compute a Pearson correlation between ratings
> for two items. Pearson is the covariance divided by the product of the
> standard deviations. So, just multiply the similarity value you get by
> the standard deviations of the items' preference values.
> 
> The variance of each item's preference values is simply the square of
> the standard deviation, if that's what you mean.
> 
> You can use RunningAverageAndStdDev to help compute standard deviation
> if you like.
> 
> On Thu, Nov 26, 2009 at 3:14 PM, jamborta <ja...@gmail.com> wrote:
>>
>> hi guys,
>> just wondering if you have a method implemeted which would calculate the
>> covariance between two items. and the variance of an item. I looked
>> itemSimilarities but that one does something different.
>>
>> thanks
>> Tama
>> --
>> View this message in context:
>> http://old.nabble.com/Mahout-Taste-covariance-between-two-items-tp26530825p26530825.html
>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>
>>
> 
> 



-- 
View this message in context: http://old.nabble.com/Mahout-Taste-covariance-between-two-items-tp26530825p26535849.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Mahout/Taste covariance between two items

Posted by jamborta <ja...@gmail.com>.

great. thanks a lot.


srowen wrote:
> 
> Yes. Look at PearsonCorrelationSimilarity. It implements
> ItemSimilarity so it can compute a Pearson correlation between ratings
> for two items. Pearson is the covariance divided by the product of the
> standard deviations. So, just multiply the similarity value you get by
> the standard deviations of the items' preference values.
> 
> The variance of each item's preference values is simply the square of
> the standard deviation, if that's what you mean.
> 
> You can use RunningAverageAndStdDev to help compute standard deviation
> if you like.
> 
> On Thu, Nov 26, 2009 at 3:14 PM, jamborta <ja...@gmail.com> wrote:
>>
>> hi guys,
>> just wondering if you have a method implemeted which would calculate the
>> covariance between two items. and the variance of an item. I looked
>> itemSimilarities but that one does something different.
>>
>> thanks
>> Tama
>> --
>> View this message in context:
>> http://old.nabble.com/Mahout-Taste-covariance-between-two-items-tp26530825p26530825.html
>> Sent from the Mahout User List mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: http://old.nabble.com/Mahout-Taste-covariance-between-two-items-tp26530825p26533265.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Mahout/Taste covariance between two items

Posted by Sean Owen <sr...@gmail.com>.

Yes. Look at PearsonCorrelationSimilarity. It implements
ItemSimilarity so it can compute a Pearson correlation between ratings
for two items. Pearson is the covariance divided by the product of the
standard deviations. So, just multiply the similarity value you get by
the standard deviations of the items' preference values.

The variance of each item's preference values is simply the square of
the standard deviation, if that's what you mean.

You can use RunningAverageAndStdDev to help compute standard deviation
if you like.

On Thu, Nov 26, 2009 at 3:14 PM, jamborta <ja...@gmail.com> wrote:
>
> hi guys,
> just wondering if you have a method implemeted which would calculate the
> covariance between two items. and the variance of an item. I looked
> itemSimilarities but that one does something different.
>
> thanks
> Tama
> --
> View this message in context: http://old.nabble.com/Mahout-Taste-covariance-between-two-items-tp26530825p26530825.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>
>