You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Amit Nithian <an...@gmail.com> on 2013/11/26 20:51:20 UTC

Question about Pearson Correlation in non-Taste mode

Hi all,

Apologies if this is a repeat question as I just joined the list but I have
a question about the way that metrics like Cosine and Pearson are
calculated in Hadoop "mode" (i.e. non Taste).

As far as I understand, the vectors used for computing pairwise item
similarity in Taste are based on the co-rated items; however, in the Hadoop
implementation, I don't see this done.

The implementation of the distributed item-item similarity comes from this
paper http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf. I didn't
see anything in this paper about filtering out those elements from the
vectors not co-rated and this can make a difference especially when you
normalize the ratings by dividing by the average item rating. In some
cases, the # users to divide by can be fewer depending on the sparseness of
the vector.

Any clarity on this would be helpful.

Thanks!
Amit

RE: Question about Pearson Correlation in non-Taste mode

Posted by Jason Xin <Ja...@sas.com>.
Thanks, Ted 

Jason Xin

-----Original Message-----
From: Ted Dunning [mailto:ted.dunning@gmail.com] 
Sent: Friday, December 06, 2013 9:25 PM
To: user@mahout.apache.org
Subject: Re: Question about Pearson Correlation in non-Taste mode

The second link was an article I wrote that led eventually to the dissertation (third link).




On Fri, Dec 6, 2013 at 5:15 PM, Jason Xin <Ja...@sas.com> wrote:

> Ted,
>
> Is this your doctoral "Accurate Methods for the Statistics of Surprise 
> and Coincidence" , the second one PDF you attached, or you have 
> another one you can forward to me, your doctoral dissertation? Thanks.
>
> Jason Xin
>
> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Friday, December 06, 2013 7:56 PM
> To: user@mahout.apache.org
> Subject: Re: Question about Pearson Correlation in non-Taste mode
>
> See
>
> http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html
> http://acl.ldc.upenn.edu/J/J93/J93-1003.pdf
> http://arxiv.org/abs/1207.1847
>
>
>
>
>
> On Fri, Dec 6, 2013 at 1:09 PM, Amit Nithian <an...@gmail.com> wrote:
>
> > Hey Sebastian,
> >
> > Thanks again for the explanation. So now you have me intrigued about 
> > something else. Why is it that logliklihood ratio test is a better 
> > measure for essentially implicit ratings? Are there 
> > resources/research papers you can point me to explaining this?
> >
> > Take care
> > Amit
> >
> >
> > On Sun, Dec 1, 2013 at 9:25 AM, Sebastian Schelter
> > <ss...@googlemail.com>wrote:
> >
> > > Hi Amit,
> > >
> > > No need to excuse for picking on me, I'm happy about anyone 
> > > digging into the paper :)
> > >
> > > The reason, I implemented Pearson in this (flawed) way has to do 
> > > with the way the parallel algorithm works:
> > >
> > > It never compares two item vectors in memory, instead it 
> > > preprocesses the vectors and computes sparse dot products in 
> > > parallel. The centering which is usually done for Pearson 
> > > correlation is dependent on which pair of vectors you're currently 
> > > looking at (and doesn't fit the parallel algorithm). We had an 
> > > earlier implementation that didn't have this flaw, but was way 
> > > slower
> than the current one.
> > >
> > > Rating prediction on explicit feedback data like ratings for which 
> > > Pearson correlation is mostly used in CF, is a rather academic 
> > > topic and in science there are nearly no datasets that really 
> > > require you to go to Hadoop.
> > >
> > > On the other hand item prediction on implicit feedback data (like
> > > clicks) is the common scenario in the majority of industry 
> > > usecases, but here count-based similarity measures like the 
> > > loglikelihood ratio test give much better results. The current 
> > > implementation of Mahout's distributed itembased recommender is 
> > > clearly designed and tuned for the latter usecase.
> > >
> > > I hope that answers your question.
> > >
> > > --sebastian
> > >
> > > On 01.12.2013 18:10, Amit Nithian wrote:
> > > > Thanks guys! So the real question is not so much what's the 
> > > > average of
> > > the
> > > > vector with the missing rating (although yes that was a 
> > > > question) but what's the average of the vector with all the 
> > > > ratings specified but the second rating that is not shared with 
> > > > the first
> user:
> > > > [5 - 4] vs [4 5 2].
> > > >
> > > > If we agree that the first is 4.5 then is the second one 11/3 or 
> > > > 3 ((4+2)/2)? Taste has this as ((4+2)/2) while distributed mode 
> > > > has it as 11/3.
> > > >
> > > > Since Taste (and Lenskit) is sequential, it can (and will only) 
> > > > look at co-occurring ratings whereas the Hadoop implementation 
> > > > doesn't. The
> > paper
> > > > that Sebastian wrote has a pre-processing step where (for 
> > > > Pearson) you subtract each element of an item-rating vector from 
> > > > the average rating which implies that each item-rating vector is 
> > > > treated independently of
> > > each
> > > > other whereas in the sequential/non-distributed mode it's all
> > considered
> > > > together.
> > > >
> > > > My main reason for posting is because the Taste implementation 
> > > > of
> > > item-item
> > > > similarity differs from the distributed implementation. Since I 
> > > > am
> > > totally
> > > > new to this space and these similarities I wanted to understand 
> > > > if
> > there
> > > is
> > > > a reason for this difference and whether or not it matters. 
> > > > Sounds like from the discussion it doesn't matter but 
> > > > understanding why helps me explain this to others.
> > > >
> > > > My guess (and I'm glad Sebastian is on this list so he can help 
> > > > confirm/deny this.. sorry I'm not picking on you just happy to 
> > > > be able
> > to
> > > > talk to you about your good paper) is that considering 
> > > > co-occuring
> > > ratings
> > > > in a distributed implementation would require access to the full 
> > > > matrix which defeats the parallel nature of computing item-item
> similarity?
> > > >
> > > > Thanks again!
> > > > Amit
> > > >
> > > >
> > > > On Sun, Dec 1, 2013 at 2:55 AM, Sean Owen <sr...@gmail.com> wrote:
> > > >
> > > >> It's not an issue of how to be careful with sparsity and 
> > > >> subtracting means, although that's a valuable point in itself.
> > > >> The question is what the mean is supposed to be.
> > > >>
> > > >> You can't think of missing ratings as 0 in general, and the 
> > > >> example here shows why: you're acting as if most movies are 
> > > >> hated. Instead they are excluded from the computation entirely.
> > > >>
> > > >> m_x should be 4.5 in the example here. That's consistent with 
> > > >> literature and the other implementations earlier in this project.
> > > >>
> > > >> I don't know the Hadoop implementation well enough, and wasn't 
> > > >> sure from the comments above, whether it does end up behaving 
> > > >> as if it's "4.5" or "3". If it's not 4.5 I would call that a bug.
> > > >> Items that aren't co-rated can't meaningfully be included in 
> > > >> this
> computation.
> > > >>
> > > >>
> > > >> On Sun, Dec 1, 2013 at 8:29 AM, Ted Dunning 
> > > >> <te...@gmail.com>
> > > wrote:
> > > >>> Good point Amit.
> > > >>>
> > > >>> Not sure how much this matters.  It may be that 
> > > >>> PearsonCorrelationSimilarity is bad name that should be 
> > > >>> PearonInspiredCorrelationSimilarity.  My guess is that this
> > > >> implementation
> > > >>> is lifted directly from the very early recommendation 
> > > >>> literature and
> > is
> > > >>> reflective of the way that it was used back then.
> > > >>
> > > >
> > >
> > >
> >
>

Re: Question about Pearson Correlation in non-Taste mode

Posted by Ted Dunning <te...@gmail.com>.
The second link was an article I wrote that led eventually to the
dissertation (third link).




On Fri, Dec 6, 2013 at 5:15 PM, Jason Xin <Ja...@sas.com> wrote:

> Ted,
>
> Is this your doctoral "Accurate Methods for the Statistics of Surprise and
> Coincidence" , the second one PDF you attached, or you have another one you
> can forward to me, your doctoral dissertation? Thanks.
>
> Jason Xin
>
> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Friday, December 06, 2013 7:56 PM
> To: user@mahout.apache.org
> Subject: Re: Question about Pearson Correlation in non-Taste mode
>
> See
>
> http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html
> http://acl.ldc.upenn.edu/J/J93/J93-1003.pdf
> http://arxiv.org/abs/1207.1847
>
>
>
>
>
> On Fri, Dec 6, 2013 at 1:09 PM, Amit Nithian <an...@gmail.com> wrote:
>
> > Hey Sebastian,
> >
> > Thanks again for the explanation. So now you have me intrigued about
> > something else. Why is it that logliklihood ratio test is a better
> > measure for essentially implicit ratings? Are there resources/research
> > papers you can point me to explaining this?
> >
> > Take care
> > Amit
> >
> >
> > On Sun, Dec 1, 2013 at 9:25 AM, Sebastian Schelter
> > <ss...@googlemail.com>wrote:
> >
> > > Hi Amit,
> > >
> > > No need to excuse for picking on me, I'm happy about anyone digging
> > > into the paper :)
> > >
> > > The reason, I implemented Pearson in this (flawed) way has to do
> > > with the way the parallel algorithm works:
> > >
> > > It never compares two item vectors in memory, instead it
> > > preprocesses the vectors and computes sparse dot products in
> > > parallel. The centering which is usually done for Pearson
> > > correlation is dependent on which pair of vectors you're currently
> > > looking at (and doesn't fit the parallel algorithm). We had an
> > > earlier implementation that didn't have this flaw, but was way slower
> than the current one.
> > >
> > > Rating prediction on explicit feedback data like ratings for which
> > > Pearson correlation is mostly used in CF, is a rather academic topic
> > > and in science there are nearly no datasets that really require you
> > > to go to Hadoop.
> > >
> > > On the other hand item prediction on implicit feedback data (like
> > > clicks) is the common scenario in the majority of industry usecases,
> > > but here count-based similarity measures like the loglikelihood
> > > ratio test give much better results. The current implementation of
> > > Mahout's distributed itembased recommender is clearly designed and
> > > tuned for the latter usecase.
> > >
> > > I hope that answers your question.
> > >
> > > --sebastian
> > >
> > > On 01.12.2013 18:10, Amit Nithian wrote:
> > > > Thanks guys! So the real question is not so much what's the
> > > > average of
> > > the
> > > > vector with the missing rating (although yes that was a question)
> > > > but what's the average of the vector with all the ratings
> > > > specified but the second rating that is not shared with the first
> user:
> > > > [5 - 4] vs [4 5 2].
> > > >
> > > > If we agree that the first is 4.5 then is the second one 11/3 or 3
> > > > ((4+2)/2)? Taste has this as ((4+2)/2) while distributed mode has
> > > > it as 11/3.
> > > >
> > > > Since Taste (and Lenskit) is sequential, it can (and will only)
> > > > look at co-occurring ratings whereas the Hadoop implementation
> > > > doesn't. The
> > paper
> > > > that Sebastian wrote has a pre-processing step where (for Pearson)
> > > > you subtract each element of an item-rating vector from the
> > > > average rating which implies that each item-rating vector is
> > > > treated independently of
> > > each
> > > > other whereas in the sequential/non-distributed mode it's all
> > considered
> > > > together.
> > > >
> > > > My main reason for posting is because the Taste implementation of
> > > item-item
> > > > similarity differs from the distributed implementation. Since I am
> > > totally
> > > > new to this space and these similarities I wanted to understand if
> > there
> > > is
> > > > a reason for this difference and whether or not it matters. Sounds
> > > > like from the discussion it doesn't matter but understanding why
> > > > helps me explain this to others.
> > > >
> > > > My guess (and I'm glad Sebastian is on this list so he can help
> > > > confirm/deny this.. sorry I'm not picking on you just happy to be
> > > > able
> > to
> > > > talk to you about your good paper) is that considering co-occuring
> > > ratings
> > > > in a distributed implementation would require access to the full
> > > > matrix which defeats the parallel nature of computing item-item
> similarity?
> > > >
> > > > Thanks again!
> > > > Amit
> > > >
> > > >
> > > > On Sun, Dec 1, 2013 at 2:55 AM, Sean Owen <sr...@gmail.com> wrote:
> > > >
> > > >> It's not an issue of how to be careful with sparsity and
> > > >> subtracting means, although that's a valuable point in itself.
> > > >> The question is what the mean is supposed to be.
> > > >>
> > > >> You can't think of missing ratings as 0 in general, and the
> > > >> example here shows why: you're acting as if most movies are
> > > >> hated. Instead they are excluded from the computation entirely.
> > > >>
> > > >> m_x should be 4.5 in the example here. That's consistent with
> > > >> literature and the other implementations earlier in this project.
> > > >>
> > > >> I don't know the Hadoop implementation well enough, and wasn't
> > > >> sure from the comments above, whether it does end up behaving as
> > > >> if it's "4.5" or "3". If it's not 4.5 I would call that a bug.
> > > >> Items that aren't co-rated can't meaningfully be included in this
> computation.
> > > >>
> > > >>
> > > >> On Sun, Dec 1, 2013 at 8:29 AM, Ted Dunning
> > > >> <te...@gmail.com>
> > > wrote:
> > > >>> Good point Amit.
> > > >>>
> > > >>> Not sure how much this matters.  It may be that
> > > >>> PearsonCorrelationSimilarity is bad name that should be
> > > >>> PearonInspiredCorrelationSimilarity.  My guess is that this
> > > >> implementation
> > > >>> is lifted directly from the very early recommendation literature
> > > >>> and
> > is
> > > >>> reflective of the way that it was used back then.
> > > >>
> > > >
> > >
> > >
> >
>

RE: Question about Pearson Correlation in non-Taste mode

Posted by Jason Xin <Ja...@sas.com>.
Ted, 

Is this your doctoral "Accurate Methods for the Statistics of Surprise and Coincidence" , the second one PDF you attached, or you have another one you can forward to me, your doctoral dissertation? Thanks.

Jason Xin

-----Original Message-----
From: Ted Dunning [mailto:ted.dunning@gmail.com] 
Sent: Friday, December 06, 2013 7:56 PM
To: user@mahout.apache.org
Subject: Re: Question about Pearson Correlation in non-Taste mode

See

http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html
http://acl.ldc.upenn.edu/J/J93/J93-1003.pdf
http://arxiv.org/abs/1207.1847





On Fri, Dec 6, 2013 at 1:09 PM, Amit Nithian <an...@gmail.com> wrote:

> Hey Sebastian,
>
> Thanks again for the explanation. So now you have me intrigued about 
> something else. Why is it that logliklihood ratio test is a better 
> measure for essentially implicit ratings? Are there resources/research 
> papers you can point me to explaining this?
>
> Take care
> Amit
>
>
> On Sun, Dec 1, 2013 at 9:25 AM, Sebastian Schelter
> <ss...@googlemail.com>wrote:
>
> > Hi Amit,
> >
> > No need to excuse for picking on me, I'm happy about anyone digging 
> > into the paper :)
> >
> > The reason, I implemented Pearson in this (flawed) way has to do 
> > with the way the parallel algorithm works:
> >
> > It never compares two item vectors in memory, instead it 
> > preprocesses the vectors and computes sparse dot products in 
> > parallel. The centering which is usually done for Pearson 
> > correlation is dependent on which pair of vectors you're currently 
> > looking at (and doesn't fit the parallel algorithm). We had an 
> > earlier implementation that didn't have this flaw, but was way slower than the current one.
> >
> > Rating prediction on explicit feedback data like ratings for which 
> > Pearson correlation is mostly used in CF, is a rather academic topic 
> > and in science there are nearly no datasets that really require you 
> > to go to Hadoop.
> >
> > On the other hand item prediction on implicit feedback data (like
> > clicks) is the common scenario in the majority of industry usecases, 
> > but here count-based similarity measures like the loglikelihood 
> > ratio test give much better results. The current implementation of 
> > Mahout's distributed itembased recommender is clearly designed and 
> > tuned for the latter usecase.
> >
> > I hope that answers your question.
> >
> > --sebastian
> >
> > On 01.12.2013 18:10, Amit Nithian wrote:
> > > Thanks guys! So the real question is not so much what's the 
> > > average of
> > the
> > > vector with the missing rating (although yes that was a question) 
> > > but what's the average of the vector with all the ratings 
> > > specified but the second rating that is not shared with the first user:
> > > [5 - 4] vs [4 5 2].
> > >
> > > If we agree that the first is 4.5 then is the second one 11/3 or 3 
> > > ((4+2)/2)? Taste has this as ((4+2)/2) while distributed mode has 
> > > it as 11/3.
> > >
> > > Since Taste (and Lenskit) is sequential, it can (and will only) 
> > > look at co-occurring ratings whereas the Hadoop implementation 
> > > doesn't. The
> paper
> > > that Sebastian wrote has a pre-processing step where (for Pearson) 
> > > you subtract each element of an item-rating vector from the 
> > > average rating which implies that each item-rating vector is 
> > > treated independently of
> > each
> > > other whereas in the sequential/non-distributed mode it's all
> considered
> > > together.
> > >
> > > My main reason for posting is because the Taste implementation of
> > item-item
> > > similarity differs from the distributed implementation. Since I am
> > totally
> > > new to this space and these similarities I wanted to understand if
> there
> > is
> > > a reason for this difference and whether or not it matters. Sounds 
> > > like from the discussion it doesn't matter but understanding why 
> > > helps me explain this to others.
> > >
> > > My guess (and I'm glad Sebastian is on this list so he can help 
> > > confirm/deny this.. sorry I'm not picking on you just happy to be 
> > > able
> to
> > > talk to you about your good paper) is that considering co-occuring
> > ratings
> > > in a distributed implementation would require access to the full 
> > > matrix which defeats the parallel nature of computing item-item similarity?
> > >
> > > Thanks again!
> > > Amit
> > >
> > >
> > > On Sun, Dec 1, 2013 at 2:55 AM, Sean Owen <sr...@gmail.com> wrote:
> > >
> > >> It's not an issue of how to be careful with sparsity and 
> > >> subtracting means, although that's a valuable point in itself. 
> > >> The question is what the mean is supposed to be.
> > >>
> > >> You can't think of missing ratings as 0 in general, and the 
> > >> example here shows why: you're acting as if most movies are 
> > >> hated. Instead they are excluded from the computation entirely.
> > >>
> > >> m_x should be 4.5 in the example here. That's consistent with 
> > >> literature and the other implementations earlier in this project.
> > >>
> > >> I don't know the Hadoop implementation well enough, and wasn't 
> > >> sure from the comments above, whether it does end up behaving as 
> > >> if it's "4.5" or "3". If it's not 4.5 I would call that a bug. 
> > >> Items that aren't co-rated can't meaningfully be included in this computation.
> > >>
> > >>
> > >> On Sun, Dec 1, 2013 at 8:29 AM, Ted Dunning 
> > >> <te...@gmail.com>
> > wrote:
> > >>> Good point Amit.
> > >>>
> > >>> Not sure how much this matters.  It may be that 
> > >>> PearsonCorrelationSimilarity is bad name that should be 
> > >>> PearonInspiredCorrelationSimilarity.  My guess is that this
> > >> implementation
> > >>> is lifted directly from the very early recommendation literature 
> > >>> and
> is
> > >>> reflective of the way that it was used back then.
> > >>
> > >
> >
> >
>

Re: Question about Pearson Correlation in non-Taste mode

Posted by Ted Dunning <te...@gmail.com>.
See

http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html
http://acl.ldc.upenn.edu/J/J93/J93-1003.pdf
http://arxiv.org/abs/1207.1847





On Fri, Dec 6, 2013 at 1:09 PM, Amit Nithian <an...@gmail.com> wrote:

> Hey Sebastian,
>
> Thanks again for the explanation. So now you have me intrigued about
> something else. Why is it that logliklihood ratio test is a better measure
> for essentially implicit ratings? Are there resources/research papers you
> can point me to explaining this?
>
> Take care
> Amit
>
>
> On Sun, Dec 1, 2013 at 9:25 AM, Sebastian Schelter
> <ss...@googlemail.com>wrote:
>
> > Hi Amit,
> >
> > No need to excuse for picking on me, I'm happy about anyone digging into
> > the paper :)
> >
> > The reason, I implemented Pearson in this (flawed) way has to do with
> > the way the parallel algorithm works:
> >
> > It never compares two item vectors in memory, instead it preprocesses
> > the vectors and computes sparse dot products in parallel. The centering
> > which is usually done for Pearson correlation is dependent on which pair
> > of vectors you're currently looking at (and doesn't fit the parallel
> > algorithm). We had an earlier implementation that didn't have this flaw,
> > but was way slower than the current one.
> >
> > Rating prediction on explicit feedback data like ratings for which
> > Pearson correlation is mostly used in CF, is a rather academic topic and
> > in science there are nearly no datasets that really require you to go to
> > Hadoop.
> >
> > On the other hand item prediction on implicit feedback data (like
> > clicks) is the common scenario in the majority of industry usecases, but
> > here count-based similarity measures like the loglikelihood ratio test
> > give much better results. The current implementation of Mahout's
> > distributed itembased recommender is clearly designed and tuned for the
> > latter usecase.
> >
> > I hope that answers your question.
> >
> > --sebastian
> >
> > On 01.12.2013 18:10, Amit Nithian wrote:
> > > Thanks guys! So the real question is not so much what's the average of
> > the
> > > vector with the missing rating (although yes that was a question) but
> > > what's the average of the vector with all the ratings specified but the
> > > second rating that is not shared with the first user:
> > > [5 - 4] vs [4 5 2].
> > >
> > > If we agree that the first is 4.5 then is the second one 11/3 or 3
> > > ((4+2)/2)? Taste has this as ((4+2)/2) while distributed mode has it as
> > > 11/3.
> > >
> > > Since Taste (and Lenskit) is sequential, it can (and will only) look at
> > > co-occurring ratings whereas the Hadoop implementation doesn't. The
> paper
> > > that Sebastian wrote has a pre-processing step where (for Pearson) you
> > > subtract each element of an item-rating vector from the average rating
> > > which implies that each item-rating vector is treated independently of
> > each
> > > other whereas in the sequential/non-distributed mode it's all
> considered
> > > together.
> > >
> > > My main reason for posting is because the Taste implementation of
> > item-item
> > > similarity differs from the distributed implementation. Since I am
> > totally
> > > new to this space and these similarities I wanted to understand if
> there
> > is
> > > a reason for this difference and whether or not it matters. Sounds like
> > > from the discussion it doesn't matter but understanding why helps me
> > > explain this to others.
> > >
> > > My guess (and I'm glad Sebastian is on this list so he can help
> > > confirm/deny this.. sorry I'm not picking on you just happy to be able
> to
> > > talk to you about your good paper) is that considering co-occuring
> > ratings
> > > in a distributed implementation would require access to the full matrix
> > > which defeats the parallel nature of computing item-item similarity?
> > >
> > > Thanks again!
> > > Amit
> > >
> > >
> > > On Sun, Dec 1, 2013 at 2:55 AM, Sean Owen <sr...@gmail.com> wrote:
> > >
> > >> It's not an issue of how to be careful with sparsity and subtracting
> > >> means, although that's a valuable point in itself. The question is
> > >> what the mean is supposed to be.
> > >>
> > >> You can't think of missing ratings as 0 in general, and the example
> > >> here shows why: you're acting as if most movies are hated. Instead
> > >> they are excluded from the computation entirely.
> > >>
> > >> m_x should be 4.5 in the example here. That's consistent with
> > >> literature and the other implementations earlier in this project.
> > >>
> > >> I don't know the Hadoop implementation well enough, and wasn't sure
> > >> from the comments above, whether it does end up behaving as if it's
> > >> "4.5" or "3". If it's not 4.5 I would call that a bug. Items that
> > >> aren't co-rated can't meaningfully be included in this computation.
> > >>
> > >>
> > >> On Sun, Dec 1, 2013 at 8:29 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> > >>> Good point Amit.
> > >>>
> > >>> Not sure how much this matters.  It may be that
> > >>> PearsonCorrelationSimilarity is bad name that should be
> > >>> PearonInspiredCorrelationSimilarity.  My guess is that this
> > >> implementation
> > >>> is lifted directly from the very early recommendation literature and
> is
> > >>> reflective of the way that it was used back then.
> > >>
> > >
> >
> >
>

Re: Question about Pearson Correlation in non-Taste mode

Posted by Amit Nithian <an...@gmail.com>.
Hey Sebastian,

Thanks again for the explanation. So now you have me intrigued about
something else. Why is it that logliklihood ratio test is a better measure
for essentially implicit ratings? Are there resources/research papers you
can point me to explaining this?

Take care
Amit


On Sun, Dec 1, 2013 at 9:25 AM, Sebastian Schelter
<ss...@googlemail.com>wrote:

> Hi Amit,
>
> No need to excuse for picking on me, I'm happy about anyone digging into
> the paper :)
>
> The reason, I implemented Pearson in this (flawed) way has to do with
> the way the parallel algorithm works:
>
> It never compares two item vectors in memory, instead it preprocesses
> the vectors and computes sparse dot products in parallel. The centering
> which is usually done for Pearson correlation is dependent on which pair
> of vectors you're currently looking at (and doesn't fit the parallel
> algorithm). We had an earlier implementation that didn't have this flaw,
> but was way slower than the current one.
>
> Rating prediction on explicit feedback data like ratings for which
> Pearson correlation is mostly used in CF, is a rather academic topic and
> in science there are nearly no datasets that really require you to go to
> Hadoop.
>
> On the other hand item prediction on implicit feedback data (like
> clicks) is the common scenario in the majority of industry usecases, but
> here count-based similarity measures like the loglikelihood ratio test
> give much better results. The current implementation of Mahout's
> distributed itembased recommender is clearly designed and tuned for the
> latter usecase.
>
> I hope that answers your question.
>
> --sebastian
>
> On 01.12.2013 18:10, Amit Nithian wrote:
> > Thanks guys! So the real question is not so much what's the average of
> the
> > vector with the missing rating (although yes that was a question) but
> > what's the average of the vector with all the ratings specified but the
> > second rating that is not shared with the first user:
> > [5 - 4] vs [4 5 2].
> >
> > If we agree that the first is 4.5 then is the second one 11/3 or 3
> > ((4+2)/2)? Taste has this as ((4+2)/2) while distributed mode has it as
> > 11/3.
> >
> > Since Taste (and Lenskit) is sequential, it can (and will only) look at
> > co-occurring ratings whereas the Hadoop implementation doesn't. The paper
> > that Sebastian wrote has a pre-processing step where (for Pearson) you
> > subtract each element of an item-rating vector from the average rating
> > which implies that each item-rating vector is treated independently of
> each
> > other whereas in the sequential/non-distributed mode it's all considered
> > together.
> >
> > My main reason for posting is because the Taste implementation of
> item-item
> > similarity differs from the distributed implementation. Since I am
> totally
> > new to this space and these similarities I wanted to understand if there
> is
> > a reason for this difference and whether or not it matters. Sounds like
> > from the discussion it doesn't matter but understanding why helps me
> > explain this to others.
> >
> > My guess (and I'm glad Sebastian is on this list so he can help
> > confirm/deny this.. sorry I'm not picking on you just happy to be able to
> > talk to you about your good paper) is that considering co-occuring
> ratings
> > in a distributed implementation would require access to the full matrix
> > which defeats the parallel nature of computing item-item similarity?
> >
> > Thanks again!
> > Amit
> >
> >
> > On Sun, Dec 1, 2013 at 2:55 AM, Sean Owen <sr...@gmail.com> wrote:
> >
> >> It's not an issue of how to be careful with sparsity and subtracting
> >> means, although that's a valuable point in itself. The question is
> >> what the mean is supposed to be.
> >>
> >> You can't think of missing ratings as 0 in general, and the example
> >> here shows why: you're acting as if most movies are hated. Instead
> >> they are excluded from the computation entirely.
> >>
> >> m_x should be 4.5 in the example here. That's consistent with
> >> literature and the other implementations earlier in this project.
> >>
> >> I don't know the Hadoop implementation well enough, and wasn't sure
> >> from the comments above, whether it does end up behaving as if it's
> >> "4.5" or "3". If it's not 4.5 I would call that a bug. Items that
> >> aren't co-rated can't meaningfully be included in this computation.
> >>
> >>
> >> On Sun, Dec 1, 2013 at 8:29 AM, Ted Dunning <te...@gmail.com>
> wrote:
> >>> Good point Amit.
> >>>
> >>> Not sure how much this matters.  It may be that
> >>> PearsonCorrelationSimilarity is bad name that should be
> >>> PearonInspiredCorrelationSimilarity.  My guess is that this
> >> implementation
> >>> is lifted directly from the very early recommendation literature and is
> >>> reflective of the way that it was used back then.
> >>
> >
>
>

Re: Question about Pearson Correlation in non-Taste mode

Posted by Sebastian Schelter <ss...@googlemail.com>.
Hi Amit,

No need to excuse for picking on me, I'm happy about anyone digging into
the paper :)

The reason, I implemented Pearson in this (flawed) way has to do with
the way the parallel algorithm works:

It never compares two item vectors in memory, instead it preprocesses
the vectors and computes sparse dot products in parallel. The centering
which is usually done for Pearson correlation is dependent on which pair
of vectors you're currently looking at (and doesn't fit the parallel
algorithm). We had an earlier implementation that didn't have this flaw,
but was way slower than the current one.

Rating prediction on explicit feedback data like ratings for which
Pearson correlation is mostly used in CF, is a rather academic topic and
in science there are nearly no datasets that really require you to go to
Hadoop.

On the other hand item prediction on implicit feedback data (like
clicks) is the common scenario in the majority of industry usecases, but
here count-based similarity measures like the loglikelihood ratio test
give much better results. The current implementation of Mahout's
distributed itembased recommender is clearly designed and tuned for the
latter usecase.

I hope that answers your question.

--sebastian

On 01.12.2013 18:10, Amit Nithian wrote:
> Thanks guys! So the real question is not so much what's the average of the
> vector with the missing rating (although yes that was a question) but
> what's the average of the vector with all the ratings specified but the
> second rating that is not shared with the first user:
> [5 - 4] vs [4 5 2].
> 
> If we agree that the first is 4.5 then is the second one 11/3 or 3
> ((4+2)/2)? Taste has this as ((4+2)/2) while distributed mode has it as
> 11/3.
> 
> Since Taste (and Lenskit) is sequential, it can (and will only) look at
> co-occurring ratings whereas the Hadoop implementation doesn't. The paper
> that Sebastian wrote has a pre-processing step where (for Pearson) you
> subtract each element of an item-rating vector from the average rating
> which implies that each item-rating vector is treated independently of each
> other whereas in the sequential/non-distributed mode it's all considered
> together.
> 
> My main reason for posting is because the Taste implementation of item-item
> similarity differs from the distributed implementation. Since I am totally
> new to this space and these similarities I wanted to understand if there is
> a reason for this difference and whether or not it matters. Sounds like
> from the discussion it doesn't matter but understanding why helps me
> explain this to others.
> 
> My guess (and I'm glad Sebastian is on this list so he can help
> confirm/deny this.. sorry I'm not picking on you just happy to be able to
> talk to you about your good paper) is that considering co-occuring ratings
> in a distributed implementation would require access to the full matrix
> which defeats the parallel nature of computing item-item similarity?
> 
> Thanks again!
> Amit
> 
> 
> On Sun, Dec 1, 2013 at 2:55 AM, Sean Owen <sr...@gmail.com> wrote:
> 
>> It's not an issue of how to be careful with sparsity and subtracting
>> means, although that's a valuable point in itself. The question is
>> what the mean is supposed to be.
>>
>> You can't think of missing ratings as 0 in general, and the example
>> here shows why: you're acting as if most movies are hated. Instead
>> they are excluded from the computation entirely.
>>
>> m_x should be 4.5 in the example here. That's consistent with
>> literature and the other implementations earlier in this project.
>>
>> I don't know the Hadoop implementation well enough, and wasn't sure
>> from the comments above, whether it does end up behaving as if it's
>> "4.5" or "3". If it's not 4.5 I would call that a bug. Items that
>> aren't co-rated can't meaningfully be included in this computation.
>>
>>
>> On Sun, Dec 1, 2013 at 8:29 AM, Ted Dunning <te...@gmail.com> wrote:
>>> Good point Amit.
>>>
>>> Not sure how much this matters.  It may be that
>>> PearsonCorrelationSimilarity is bad name that should be
>>> PearonInspiredCorrelationSimilarity.  My guess is that this
>> implementation
>>> is lifted directly from the very early recommendation literature and is
>>> reflective of the way that it was used back then.
>>
> 


Re: Question about Pearson Correlation in non-Taste mode

Posted by Amit Nithian <an...@gmail.com>.
Thanks guys! So the real question is not so much what's the average of the
vector with the missing rating (although yes that was a question) but
what's the average of the vector with all the ratings specified but the
second rating that is not shared with the first user:
[5 - 4] vs [4 5 2].

If we agree that the first is 4.5 then is the second one 11/3 or 3
((4+2)/2)? Taste has this as ((4+2)/2) while distributed mode has it as
11/3.

Since Taste (and Lenskit) is sequential, it can (and will only) look at
co-occurring ratings whereas the Hadoop implementation doesn't. The paper
that Sebastian wrote has a pre-processing step where (for Pearson) you
subtract each element of an item-rating vector from the average rating
which implies that each item-rating vector is treated independently of each
other whereas in the sequential/non-distributed mode it's all considered
together.

My main reason for posting is because the Taste implementation of item-item
similarity differs from the distributed implementation. Since I am totally
new to this space and these similarities I wanted to understand if there is
a reason for this difference and whether or not it matters. Sounds like
from the discussion it doesn't matter but understanding why helps me
explain this to others.

My guess (and I'm glad Sebastian is on this list so he can help
confirm/deny this.. sorry I'm not picking on you just happy to be able to
talk to you about your good paper) is that considering co-occuring ratings
in a distributed implementation would require access to the full matrix
which defeats the parallel nature of computing item-item similarity?

Thanks again!
Amit


On Sun, Dec 1, 2013 at 2:55 AM, Sean Owen <sr...@gmail.com> wrote:

> It's not an issue of how to be careful with sparsity and subtracting
> means, although that's a valuable point in itself. The question is
> what the mean is supposed to be.
>
> You can't think of missing ratings as 0 in general, and the example
> here shows why: you're acting as if most movies are hated. Instead
> they are excluded from the computation entirely.
>
> m_x should be 4.5 in the example here. That's consistent with
> literature and the other implementations earlier in this project.
>
> I don't know the Hadoop implementation well enough, and wasn't sure
> from the comments above, whether it does end up behaving as if it's
> "4.5" or "3". If it's not 4.5 I would call that a bug. Items that
> aren't co-rated can't meaningfully be included in this computation.
>
>
> On Sun, Dec 1, 2013 at 8:29 AM, Ted Dunning <te...@gmail.com> wrote:
> > Good point Amit.
> >
> > Not sure how much this matters.  It may be that
> > PearsonCorrelationSimilarity is bad name that should be
> > PearonInspiredCorrelationSimilarity.  My guess is that this
> implementation
> > is lifted directly from the very early recommendation literature and is
> > reflective of the way that it was used back then.
>

Re: Question about Pearson Correlation in non-Taste mode

Posted by Sean Owen <sr...@gmail.com>.
It's not an issue of how to be careful with sparsity and subtracting
means, although that's a valuable point in itself. The question is
what the mean is supposed to be.

You can't think of missing ratings as 0 in general, and the example
here shows why: you're acting as if most movies are hated. Instead
they are excluded from the computation entirely.

m_x should be 4.5 in the example here. That's consistent with
literature and the other implementations earlier in this project.

I don't know the Hadoop implementation well enough, and wasn't sure
from the comments above, whether it does end up behaving as if it's
"4.5" or "3". If it's not 4.5 I would call that a bug. Items that
aren't co-rated can't meaningfully be included in this computation.


On Sun, Dec 1, 2013 at 8:29 AM, Ted Dunning <te...@gmail.com> wrote:
> Good point Amit.
>
> Not sure how much this matters.  It may be that
> PearsonCorrelationSimilarity is bad name that should be
> PearonInspiredCorrelationSimilarity.  My guess is that this implementation
> is lifted directly from the very early recommendation literature and is
> reflective of the way that it was used back then.

Re: Question about Pearson Correlation in non-Taste mode

Posted by Ted Dunning <te...@gmail.com>.
Good point Amit.

Not sure how much this matters.  It may be that
PearsonCorrelationSimilarity is bad name that should be
PearonInspiredCorrelationSimilarity.  My guess is that this implementation
is lifted directly from the very early recommendation literature and is
reflective of the way that it was used back then.

Remember that the context here is prediction of ratings.  If you assume
that you really want correlation and that missing elements are zero, then
this is mathematically wrong.  On the other hand, if you assume missing
elements are equal to the mean (whatever it is), then this definition is
correct.

In any case, I don't think that PearsonCorrelationSimilarity should be
"fixed" at this point.  First of all, a substantial change here is somewhat
risky since there may be people who depend on current behavior.  Second, I
think that this is almost never a particularly good recommendation
algorithm so even if the proposed change is a small improvement, it will
have negligible positive effect on the universe of production recommenders.

Remember that this function is not a stats routine.  It is an embodiment of
recommendation practice.  Were it the former, I would strongly recommend we
fix it.






On Sat, Nov 30, 2013 at 10:18 AM, Amit Nithian <an...@gmail.com> wrote:

> Hi Ted,
>
> Thanks that is what I would have thought too but I don't think that the
> Pearson Similarity (in Hadoop mode) does this:
>
> in
>
> org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.PearsonCorrelationSimilarity
> around line 31
>
> double average = vector.norm(1) / vector.getNumNonZeroElements();
> Which looks like it's taking the sum and dividing by the number of defined
> elements. Which would make my [5 - 4] average be 4.5.
>
> Thanks again
> Amit
>
> On Fri, Nov 29, 2013 at 10:34 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > On Fri, Nov 29, 2013 at 10:16 PM, Amit Nithian <an...@gmail.com>
> wrote:
> >
> > > Hi Ted,
> > >
> > > Thanks for your response. I thought that the mean of a sparse vector is
> > > simply the mean of the "defined" elements? Why would the vectors become
> > > dense unless you're meaning that all the undefined elements (0?) now
> will
> > > be (0-m_x)?
> > >
> >
> > Yes.  Just so.  All those zero elements become non-zero and the vector is
> > thus non-dense.
> >
> >
> > >
> > > Looking at the following example:
> > > X = [5 - 4] and Y= [4 5 2].
> > >
> > > is m_x 4.5 or 3?
> >
> >
> > 3.
> >
> > This is because the elements of X are really 5, 0, and 4.  The zero is
> just
> > not stored, but it still is the value of that element.
> >
> >
> > > Is m_y 11/3 or (6/2) because we ignore the "5" since it's
> > > counterpart in X is undefined?.
> > >
> >
> > 11/3
> >
>

Re: Question about Pearson Correlation in non-Taste mode

Posted by Amit Nithian <an...@gmail.com>.
Hi Ted,

Thanks that is what I would have thought too but I don't think that the
Pearson Similarity (in Hadoop mode) does this:

in
org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.PearsonCorrelationSimilarity
around line 31

double average = vector.norm(1) / vector.getNumNonZeroElements();
Which looks like it's taking the sum and dividing by the number of defined
elements. Which would make my [5 - 4] average be 4.5.

Thanks again
Amit

On Fri, Nov 29, 2013 at 10:34 PM, Ted Dunning <te...@gmail.com> wrote:

> On Fri, Nov 29, 2013 at 10:16 PM, Amit Nithian <an...@gmail.com> wrote:
>
> > Hi Ted,
> >
> > Thanks for your response. I thought that the mean of a sparse vector is
> > simply the mean of the "defined" elements? Why would the vectors become
> > dense unless you're meaning that all the undefined elements (0?) now will
> > be (0-m_x)?
> >
>
> Yes.  Just so.  All those zero elements become non-zero and the vector is
> thus non-dense.
>
>
> >
> > Looking at the following example:
> > X = [5 - 4] and Y= [4 5 2].
> >
> > is m_x 4.5 or 3?
>
>
> 3.
>
> This is because the elements of X are really 5, 0, and 4.  The zero is just
> not stored, but it still is the value of that element.
>
>
> > Is m_y 11/3 or (6/2) because we ignore the "5" since it's
> > counterpart in X is undefined?.
> >
>
> 11/3
>

Re: Question about Pearson Correlation in non-Taste mode

Posted by Ted Dunning <te...@gmail.com>.
On Fri, Nov 29, 2013 at 10:16 PM, Amit Nithian <an...@gmail.com> wrote:

> Hi Ted,
>
> Thanks for your response. I thought that the mean of a sparse vector is
> simply the mean of the "defined" elements? Why would the vectors become
> dense unless you're meaning that all the undefined elements (0?) now will
> be (0-m_x)?
>

Yes.  Just so.  All those zero elements become non-zero and the vector is
thus non-dense.


>
> Looking at the following example:
> X = [5 - 4] and Y= [4 5 2].
>
> is m_x 4.5 or 3?


3.

This is because the elements of X are really 5, 0, and 4.  The zero is just
not stored, but it still is the value of that element.


> Is m_y 11/3 or (6/2) because we ignore the "5" since it's
> counterpart in X is undefined?.
>

11/3

Re: Question about Pearson Correlation in non-Taste mode

Posted by Amit Nithian <an...@gmail.com>.
Hi Ted,

Thanks for your response. I thought that the mean of a sparse vector is
simply the mean of the "defined" elements? Why would the vectors become
dense unless you're meaning that all the undefined elements (0?) now will
be (0-m_x)?

Looking at the following example:
X = [5 - 4] and Y= [4 5 2].

is m_x 4.5 or 3? Is m_y 11/3 or (6/2) because we ignore the "5" since it's
counterpart in X is undefined?.

Thanks again
Amit



On Fri, Nov 29, 2013 at 9:58 PM, Ted Dunning <te...@gmail.com> wrote:

> Well, the best way to compute correlation using sparse vectors is to make
> sure you keep them sparse.  To do that, you must avoid subtracting the mean
> by expanding whatever formulae you are using.  For instance, if you are
> computing
>
>     (x - m_x) . (y - m_y)
>
> (here . means dot product)
>
> If you do this directly, then you lose all benefit of sparse vectors since
> subtracting the means makes each vector dense.
>
> What you should compute instead is this alternative form
>
>    x . y - m_x e . y - m_y e . x + m_x m_y
>
> (here e represents a vector full of 1's)
>
> The dot product here is sparse and the expression m_x e . y can be computed
> (at lease in Mahout) in map-reduce idiom as
>
>     y.aggregate(Functions.PLUS, Functions.mult(m_x))
>
>
>
>
> On Fri, Nov 29, 2013 at 9:31 PM, Amit Nithian <an...@gmail.com> wrote:
>
> > Okay so I rethought my question and realized that the paper never really
> > talked about collaborative filtering but just how to calculate item-item
> > similarity in a scalable fashion. Perhaps this is the reason for why the
> > common ratings aren't used? Because that's not a pre-req for this
> > calculation?
> >
> > Although for my own clarity, I'd still like to get a better understanding
> > of what it means to calculate the correlation between sparse vectors
> where
> > you're normalizing each vector using a separate denominator.
> >
> > P.S. If my question(s) don't make sense please let me know for it's very
> > possible I am completely misunderstanding something :-).
> >
> > Thanks again!
> > Amit
> >
> >
> > On Wed, Nov 27, 2013 at 8:23 AM, Amit Nithian <an...@gmail.com>
> wrote:
> >
> > > Hey Sebastian,
> > >
> > > Thanks again. Actually I'm glad that I am talking to you as it's your
> > > paper and presentation I have questions with! :-)
> > >
> > > So to clarify my question further, looking at this presentation (
> > > http://isabel-drost.de/hadoop/slides/collabMahout.pdf) you have the
> > > following user x item matrix:
> > >     M   A   I
> > > A  5    1   4
> > > B  -    2    5
> > > P  4   3    2
> > >
> > > If I want to calculate the pearson correlation between Matrix and
> > > Inception, I'd have the rating vectors:
> > > [5 - 4] vs [4 5 2].
> > >
> > > One of the steps in your paper is the normalization step which
> subtracts
> > > the mean item rating from each value and essentially do the L2Norm of
> > this
> > > resulting vector (or in other words, the L2 norm of the mean-centered
> > > vector ?)
> > >
> > > The question I have had is what is the average rating for Matrix and
> > > Inception? I can see the following:
> > > Matrix - 4.5 (9/2), Inception - 3 (6/2) because you only consider
> shared
> > > ratings
> > > Matrix - 3 (9/3), Inception - 3.667 (11/3) assuming that the missing
> > > rating is 0
> > > Matrix - 4.5 (9/2), Inception - 3.667 (11/3) subtract from the average
> of
> > > all non-zero ratings ==> This is what I believe the current
> > implementation
> > > does.
> > >
> > > Unfortunately, neither of these yield the 0.47 listed in the
> presentation
> > > but that's a separate issue. In my testing, I see that Mahout Taste
> > > (non-distributed) uses the 1st approach while the distributed approach
> > uses
> > > the 3rd approach.
> > >
> > > I am okay with #3; however I just want to understand that this is the
> > case
> > > and that it's okay. This is why I was asking about pearson correlation
> > > between vectors of "different" lengths because the average rating is
> > being
> > > computed using a denominator (number of users) that is different
> between
> > > the two (2 vs 3).
> > >
> > > I know you said in practice that people don't use Pearson to compute
> > > inferred ratings but this is just for my complete understanding (and
> > since
> > > it's the example used in your presentation). This same question applies
> > to
> > > cosine as you are doing an L2-Norm of the vector as a pre-processing
> step
> > > and including/excluding non-shared ratings may make a difference.
> > >
> > > Thanks again!
> > > Amit
> > >
> > >
> > > On Wed, Nov 27, 2013 at 7:13 AM, Sebastian Schelter <
> > > ssc.open@googlemail.com> wrote:
> > >
> > >> Hi Amit,
> > >>
> > >> Yes, it gives different results. However in practice, most people
> don't
> > >> do rating prediction with Pearson coefficient, but use count-based
> > >> measures like the loglikelihood ratio test.
> > >>
> > >> The distributed code doesn't look at vectors of different lengths, but
> > >> simply assumes non-existent ratings as zero.
> > >>
> > >> --sebastian
> > >>
> > >> On 27.11.2013 16:09, Amit Nithian wrote:
> > >> > Comparing this against the non distributed (taste) gives different
> > >> answers
> > >> > for item item similarity as of course the non distributed looks only
> > at
> > >> > corated items. I was more wondering if this difference in practice
> > >> mattered
> > >> > or not.
> > >> >
> > >> > Also I'm confused on how you can compute the Pearson similarity
> > between
> > >> two
> > >> > vectors of different length which essentially is going on here I
> > think?
> > >> >
> > >> > Thanks again
> > >> > Amit
> > >> > On Nov 27, 2013 9:06 AM, "Sebastian Schelter" <
> > ssc.open@googlemail.com>
> > >> > wrote:
> > >> >
> > >> >> Yes, it is due to the parallel algorithm which only looks at
> > co-ratings
> > >> >> from a given user.
> > >> >>
> > >> >>
> > >> >> On 27.11.2013 15:02, Amit Nithian wrote:
> > >> >>> Thanks Sebastian! Is there a particular reason for that?
> > >> >>> On Nov 27, 2013 7:47 AM, "Sebastian Schelter" <
> > >> ssc.open@googlemail.com>
> > >> >>> wrote:
> > >> >>>
> > >> >>>> Hi Amit,
> > >> >>>>
> > >> >>>> You are right, the non-corated items are not filtered out in the
> > >> >>>> distributed implementation.
> > >> >>>>
> > >> >>>> --sebastian
> > >> >>>>
> > >> >>>>
> > >> >>>> On 26.11.2013 20:51, Amit Nithian wrote:
> > >> >>>>> Hi all,
> > >> >>>>>
> > >> >>>>> Apologies if this is a repeat question as I just joined the list
> > >> but I
> > >> >>>> have
> > >> >>>>> a question about the way that metrics like Cosine and Pearson
> are
> > >> >>>>> calculated in Hadoop "mode" (i.e. non Taste).
> > >> >>>>>
> > >> >>>>> As far as I understand, the vectors used for computing pairwise
> > item
> > >> >>>>> similarity in Taste are based on the co-rated items; however, in
> > the
> > >> >>>> Hadoop
> > >> >>>>> implementation, I don't see this done.
> > >> >>>>>
> > >> >>>>> The implementation of the distributed item-item similarity comes
> > >> from
> > >> >>>> this
> > >> >>>>> paper
> http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf
> > .
> > >> I
> > >> >>>> didn't
> > >> >>>>> see anything in this paper about filtering out those elements
> from
> > >> the
> > >> >>>>> vectors not co-rated and this can make a difference especially
> > when
> > >> you
> > >> >>>>> normalize the ratings by dividing by the average item rating. In
> > >> some
> > >> >>>>> cases, the # users to divide by can be fewer depending on the
> > >> >> sparseness
> > >> >>>> of
> > >> >>>>> the vector.
> > >> >>>>>
> > >> >>>>> Any clarity on this would be helpful.
> > >> >>>>>
> > >> >>>>> Thanks!
> > >> >>>>> Amit
> > >> >>>>>
> > >> >>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >> >>
> > >> >
> > >>
> > >>
> > >
> >
>

Re: Question about Pearson Correlation in non-Taste mode

Posted by Ted Dunning <te...@gmail.com>.
Well, the best way to compute correlation using sparse vectors is to make
sure you keep them sparse.  To do that, you must avoid subtracting the mean
by expanding whatever formulae you are using.  For instance, if you are
computing

    (x - m_x) . (y - m_y)

(here . means dot product)

If you do this directly, then you lose all benefit of sparse vectors since
subtracting the means makes each vector dense.

What you should compute instead is this alternative form

   x . y - m_x e . y - m_y e . x + m_x m_y

(here e represents a vector full of 1's)

The dot product here is sparse and the expression m_x e . y can be computed
(at lease in Mahout) in map-reduce idiom as

    y.aggregate(Functions.PLUS, Functions.mult(m_x))




On Fri, Nov 29, 2013 at 9:31 PM, Amit Nithian <an...@gmail.com> wrote:

> Okay so I rethought my question and realized that the paper never really
> talked about collaborative filtering but just how to calculate item-item
> similarity in a scalable fashion. Perhaps this is the reason for why the
> common ratings aren't used? Because that's not a pre-req for this
> calculation?
>
> Although for my own clarity, I'd still like to get a better understanding
> of what it means to calculate the correlation between sparse vectors where
> you're normalizing each vector using a separate denominator.
>
> P.S. If my question(s) don't make sense please let me know for it's very
> possible I am completely misunderstanding something :-).
>
> Thanks again!
> Amit
>
>
> On Wed, Nov 27, 2013 at 8:23 AM, Amit Nithian <an...@gmail.com> wrote:
>
> > Hey Sebastian,
> >
> > Thanks again. Actually I'm glad that I am talking to you as it's your
> > paper and presentation I have questions with! :-)
> >
> > So to clarify my question further, looking at this presentation (
> > http://isabel-drost.de/hadoop/slides/collabMahout.pdf) you have the
> > following user x item matrix:
> >     M   A   I
> > A  5    1   4
> > B  -    2    5
> > P  4   3    2
> >
> > If I want to calculate the pearson correlation between Matrix and
> > Inception, I'd have the rating vectors:
> > [5 - 4] vs [4 5 2].
> >
> > One of the steps in your paper is the normalization step which subtracts
> > the mean item rating from each value and essentially do the L2Norm of
> this
> > resulting vector (or in other words, the L2 norm of the mean-centered
> > vector ?)
> >
> > The question I have had is what is the average rating for Matrix and
> > Inception? I can see the following:
> > Matrix - 4.5 (9/2), Inception - 3 (6/2) because you only consider shared
> > ratings
> > Matrix - 3 (9/3), Inception - 3.667 (11/3) assuming that the missing
> > rating is 0
> > Matrix - 4.5 (9/2), Inception - 3.667 (11/3) subtract from the average of
> > all non-zero ratings ==> This is what I believe the current
> implementation
> > does.
> >
> > Unfortunately, neither of these yield the 0.47 listed in the presentation
> > but that's a separate issue. In my testing, I see that Mahout Taste
> > (non-distributed) uses the 1st approach while the distributed approach
> uses
> > the 3rd approach.
> >
> > I am okay with #3; however I just want to understand that this is the
> case
> > and that it's okay. This is why I was asking about pearson correlation
> > between vectors of "different" lengths because the average rating is
> being
> > computed using a denominator (number of users) that is different between
> > the two (2 vs 3).
> >
> > I know you said in practice that people don't use Pearson to compute
> > inferred ratings but this is just for my complete understanding (and
> since
> > it's the example used in your presentation). This same question applies
> to
> > cosine as you are doing an L2-Norm of the vector as a pre-processing step
> > and including/excluding non-shared ratings may make a difference.
> >
> > Thanks again!
> > Amit
> >
> >
> > On Wed, Nov 27, 2013 at 7:13 AM, Sebastian Schelter <
> > ssc.open@googlemail.com> wrote:
> >
> >> Hi Amit,
> >>
> >> Yes, it gives different results. However in practice, most people don't
> >> do rating prediction with Pearson coefficient, but use count-based
> >> measures like the loglikelihood ratio test.
> >>
> >> The distributed code doesn't look at vectors of different lengths, but
> >> simply assumes non-existent ratings as zero.
> >>
> >> --sebastian
> >>
> >> On 27.11.2013 16:09, Amit Nithian wrote:
> >> > Comparing this against the non distributed (taste) gives different
> >> answers
> >> > for item item similarity as of course the non distributed looks only
> at
> >> > corated items. I was more wondering if this difference in practice
> >> mattered
> >> > or not.
> >> >
> >> > Also I'm confused on how you can compute the Pearson similarity
> between
> >> two
> >> > vectors of different length which essentially is going on here I
> think?
> >> >
> >> > Thanks again
> >> > Amit
> >> > On Nov 27, 2013 9:06 AM, "Sebastian Schelter" <
> ssc.open@googlemail.com>
> >> > wrote:
> >> >
> >> >> Yes, it is due to the parallel algorithm which only looks at
> co-ratings
> >> >> from a given user.
> >> >>
> >> >>
> >> >> On 27.11.2013 15:02, Amit Nithian wrote:
> >> >>> Thanks Sebastian! Is there a particular reason for that?
> >> >>> On Nov 27, 2013 7:47 AM, "Sebastian Schelter" <
> >> ssc.open@googlemail.com>
> >> >>> wrote:
> >> >>>
> >> >>>> Hi Amit,
> >> >>>>
> >> >>>> You are right, the non-corated items are not filtered out in the
> >> >>>> distributed implementation.
> >> >>>>
> >> >>>> --sebastian
> >> >>>>
> >> >>>>
> >> >>>> On 26.11.2013 20:51, Amit Nithian wrote:
> >> >>>>> Hi all,
> >> >>>>>
> >> >>>>> Apologies if this is a repeat question as I just joined the list
> >> but I
> >> >>>> have
> >> >>>>> a question about the way that metrics like Cosine and Pearson are
> >> >>>>> calculated in Hadoop "mode" (i.e. non Taste).
> >> >>>>>
> >> >>>>> As far as I understand, the vectors used for computing pairwise
> item
> >> >>>>> similarity in Taste are based on the co-rated items; however, in
> the
> >> >>>> Hadoop
> >> >>>>> implementation, I don't see this done.
> >> >>>>>
> >> >>>>> The implementation of the distributed item-item similarity comes
> >> from
> >> >>>> this
> >> >>>>> paper http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf
> .
> >> I
> >> >>>> didn't
> >> >>>>> see anything in this paper about filtering out those elements from
> >> the
> >> >>>>> vectors not co-rated and this can make a difference especially
> when
> >> you
> >> >>>>> normalize the ratings by dividing by the average item rating. In
> >> some
> >> >>>>> cases, the # users to divide by can be fewer depending on the
> >> >> sparseness
> >> >>>> of
> >> >>>>> the vector.
> >> >>>>>
> >> >>>>> Any clarity on this would be helpful.
> >> >>>>>
> >> >>>>> Thanks!
> >> >>>>> Amit
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >> >>
> >> >
> >>
> >>
> >
>

Re: Question about Pearson Correlation in non-Taste mode

Posted by Amit Nithian <an...@gmail.com>.
Okay so I rethought my question and realized that the paper never really
talked about collaborative filtering but just how to calculate item-item
similarity in a scalable fashion. Perhaps this is the reason for why the
common ratings aren't used? Because that's not a pre-req for this
calculation?

Although for my own clarity, I'd still like to get a better understanding
of what it means to calculate the correlation between sparse vectors where
you're normalizing each vector using a separate denominator.

P.S. If my question(s) don't make sense please let me know for it's very
possible I am completely misunderstanding something :-).

Thanks again!
Amit


On Wed, Nov 27, 2013 at 8:23 AM, Amit Nithian <an...@gmail.com> wrote:

> Hey Sebastian,
>
> Thanks again. Actually I'm glad that I am talking to you as it's your
> paper and presentation I have questions with! :-)
>
> So to clarify my question further, looking at this presentation (
> http://isabel-drost.de/hadoop/slides/collabMahout.pdf) you have the
> following user x item matrix:
>     M   A   I
> A  5    1   4
> B  -    2    5
> P  4   3    2
>
> If I want to calculate the pearson correlation between Matrix and
> Inception, I'd have the rating vectors:
> [5 - 4] vs [4 5 2].
>
> One of the steps in your paper is the normalization step which subtracts
> the mean item rating from each value and essentially do the L2Norm of this
> resulting vector (or in other words, the L2 norm of the mean-centered
> vector ?)
>
> The question I have had is what is the average rating for Matrix and
> Inception? I can see the following:
> Matrix - 4.5 (9/2), Inception - 3 (6/2) because you only consider shared
> ratings
> Matrix - 3 (9/3), Inception - 3.667 (11/3) assuming that the missing
> rating is 0
> Matrix - 4.5 (9/2), Inception - 3.667 (11/3) subtract from the average of
> all non-zero ratings ==> This is what I believe the current implementation
> does.
>
> Unfortunately, neither of these yield the 0.47 listed in the presentation
> but that's a separate issue. In my testing, I see that Mahout Taste
> (non-distributed) uses the 1st approach while the distributed approach uses
> the 3rd approach.
>
> I am okay with #3; however I just want to understand that this is the case
> and that it's okay. This is why I was asking about pearson correlation
> between vectors of "different" lengths because the average rating is being
> computed using a denominator (number of users) that is different between
> the two (2 vs 3).
>
> I know you said in practice that people don't use Pearson to compute
> inferred ratings but this is just for my complete understanding (and since
> it's the example used in your presentation). This same question applies to
> cosine as you are doing an L2-Norm of the vector as a pre-processing step
> and including/excluding non-shared ratings may make a difference.
>
> Thanks again!
> Amit
>
>
> On Wed, Nov 27, 2013 at 7:13 AM, Sebastian Schelter <
> ssc.open@googlemail.com> wrote:
>
>> Hi Amit,
>>
>> Yes, it gives different results. However in practice, most people don't
>> do rating prediction with Pearson coefficient, but use count-based
>> measures like the loglikelihood ratio test.
>>
>> The distributed code doesn't look at vectors of different lengths, but
>> simply assumes non-existent ratings as zero.
>>
>> --sebastian
>>
>> On 27.11.2013 16:09, Amit Nithian wrote:
>> > Comparing this against the non distributed (taste) gives different
>> answers
>> > for item item similarity as of course the non distributed looks only at
>> > corated items. I was more wondering if this difference in practice
>> mattered
>> > or not.
>> >
>> > Also I'm confused on how you can compute the Pearson similarity between
>> two
>> > vectors of different length which essentially is going on here I think?
>> >
>> > Thanks again
>> > Amit
>> > On Nov 27, 2013 9:06 AM, "Sebastian Schelter" <ss...@googlemail.com>
>> > wrote:
>> >
>> >> Yes, it is due to the parallel algorithm which only looks at co-ratings
>> >> from a given user.
>> >>
>> >>
>> >> On 27.11.2013 15:02, Amit Nithian wrote:
>> >>> Thanks Sebastian! Is there a particular reason for that?
>> >>> On Nov 27, 2013 7:47 AM, "Sebastian Schelter" <
>> ssc.open@googlemail.com>
>> >>> wrote:
>> >>>
>> >>>> Hi Amit,
>> >>>>
>> >>>> You are right, the non-corated items are not filtered out in the
>> >>>> distributed implementation.
>> >>>>
>> >>>> --sebastian
>> >>>>
>> >>>>
>> >>>> On 26.11.2013 20:51, Amit Nithian wrote:
>> >>>>> Hi all,
>> >>>>>
>> >>>>> Apologies if this is a repeat question as I just joined the list
>> but I
>> >>>> have
>> >>>>> a question about the way that metrics like Cosine and Pearson are
>> >>>>> calculated in Hadoop "mode" (i.e. non Taste).
>> >>>>>
>> >>>>> As far as I understand, the vectors used for computing pairwise item
>> >>>>> similarity in Taste are based on the co-rated items; however, in the
>> >>>> Hadoop
>> >>>>> implementation, I don't see this done.
>> >>>>>
>> >>>>> The implementation of the distributed item-item similarity comes
>> from
>> >>>> this
>> >>>>> paper http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf.
>> I
>> >>>> didn't
>> >>>>> see anything in this paper about filtering out those elements from
>> the
>> >>>>> vectors not co-rated and this can make a difference especially when
>> you
>> >>>>> normalize the ratings by dividing by the average item rating. In
>> some
>> >>>>> cases, the # users to divide by can be fewer depending on the
>> >> sparseness
>> >>>> of
>> >>>>> the vector.
>> >>>>>
>> >>>>> Any clarity on this would be helpful.
>> >>>>>
>> >>>>> Thanks!
>> >>>>> Amit
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >
>>
>>
>

Re: Question about Pearson Correlation in non-Taste mode

Posted by Amit Nithian <an...@gmail.com>.
Hey Sebastian,

Thanks again. Actually I'm glad that I am talking to you as it's your paper
and presentation I have questions with! :-)

So to clarify my question further, looking at this presentation (
http://isabel-drost.de/hadoop/slides/collabMahout.pdf) you have the
following user x item matrix:
    M   A   I
A  5    1   4
B  -    2    5
P  4   3    2

If I want to calculate the pearson correlation between Matrix and
Inception, I'd have the rating vectors:
[5 - 4] vs [4 5 2].

One of the steps in your paper is the normalization step which subtracts
the mean item rating from each value and essentially do the L2Norm of this
resulting vector (or in other words, the L2 norm of the mean-centered
vector ?)

The question I have had is what is the average rating for Matrix and
Inception? I can see the following:
Matrix - 4.5 (9/2), Inception - 3 (6/2) because you only consider shared
ratings
Matrix - 3 (9/3), Inception - 3.667 (11/3) assuming that the missing rating
is 0
Matrix - 4.5 (9/2), Inception - 3.667 (11/3) subtract from the average of
all non-zero ratings ==> This is what I believe the current implementation
does.

Unfortunately, neither of these yield the 0.47 listed in the presentation
but that's a separate issue. In my testing, I see that Mahout Taste
(non-distributed) uses the 1st approach while the distributed approach uses
the 3rd approach.

I am okay with #3; however I just want to understand that this is the case
and that it's okay. This is why I was asking about pearson correlation
between vectors of "different" lengths because the average rating is being
computed using a denominator (number of users) that is different between
the two (2 vs 3).

I know you said in practice that people don't use Pearson to compute
inferred ratings but this is just for my complete understanding (and since
it's the example used in your presentation). This same question applies to
cosine as you are doing an L2-Norm of the vector as a pre-processing step
and including/excluding non-shared ratings may make a difference.

Thanks again!
Amit


On Wed, Nov 27, 2013 at 7:13 AM, Sebastian Schelter <ssc.open@googlemail.com
> wrote:

> Hi Amit,
>
> Yes, it gives different results. However in practice, most people don't
> do rating prediction with Pearson coefficient, but use count-based
> measures like the loglikelihood ratio test.
>
> The distributed code doesn't look at vectors of different lengths, but
> simply assumes non-existent ratings as zero.
>
> --sebastian
>
> On 27.11.2013 16:09, Amit Nithian wrote:
> > Comparing this against the non distributed (taste) gives different
> answers
> > for item item similarity as of course the non distributed looks only at
> > corated items. I was more wondering if this difference in practice
> mattered
> > or not.
> >
> > Also I'm confused on how you can compute the Pearson similarity between
> two
> > vectors of different length which essentially is going on here I think?
> >
> > Thanks again
> > Amit
> > On Nov 27, 2013 9:06 AM, "Sebastian Schelter" <ss...@googlemail.com>
> > wrote:
> >
> >> Yes, it is due to the parallel algorithm which only looks at co-ratings
> >> from a given user.
> >>
> >>
> >> On 27.11.2013 15:02, Amit Nithian wrote:
> >>> Thanks Sebastian! Is there a particular reason for that?
> >>> On Nov 27, 2013 7:47 AM, "Sebastian Schelter" <ssc.open@googlemail.com
> >
> >>> wrote:
> >>>
> >>>> Hi Amit,
> >>>>
> >>>> You are right, the non-corated items are not filtered out in the
> >>>> distributed implementation.
> >>>>
> >>>> --sebastian
> >>>>
> >>>>
> >>>> On 26.11.2013 20:51, Amit Nithian wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> Apologies if this is a repeat question as I just joined the list but
> I
> >>>> have
> >>>>> a question about the way that metrics like Cosine and Pearson are
> >>>>> calculated in Hadoop "mode" (i.e. non Taste).
> >>>>>
> >>>>> As far as I understand, the vectors used for computing pairwise item
> >>>>> similarity in Taste are based on the co-rated items; however, in the
> >>>> Hadoop
> >>>>> implementation, I don't see this done.
> >>>>>
> >>>>> The implementation of the distributed item-item similarity comes from
> >>>> this
> >>>>> paper http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf. I
> >>>> didn't
> >>>>> see anything in this paper about filtering out those elements from
> the
> >>>>> vectors not co-rated and this can make a difference especially when
> you
> >>>>> normalize the ratings by dividing by the average item rating. In some
> >>>>> cases, the # users to divide by can be fewer depending on the
> >> sparseness
> >>>> of
> >>>>> the vector.
> >>>>>
> >>>>> Any clarity on this would be helpful.
> >>>>>
> >>>>> Thanks!
> >>>>> Amit
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Re: Question about Pearson Correlation in non-Taste mode

Posted by Sebastian Schelter <ss...@googlemail.com>.
Hi Amit,

Yes, it gives different results. However in practice, most people don't
do rating prediction with Pearson coefficient, but use count-based
measures like the loglikelihood ratio test.

The distributed code doesn't look at vectors of different lengths, but
simply assumes non-existent ratings as zero.

--sebastian

On 27.11.2013 16:09, Amit Nithian wrote:
> Comparing this against the non distributed (taste) gives different answers
> for item item similarity as of course the non distributed looks only at
> corated items. I was more wondering if this difference in practice mattered
> or not.
> 
> Also I'm confused on how you can compute the Pearson similarity between two
> vectors of different length which essentially is going on here I think?
> 
> Thanks again
> Amit
> On Nov 27, 2013 9:06 AM, "Sebastian Schelter" <ss...@googlemail.com>
> wrote:
> 
>> Yes, it is due to the parallel algorithm which only looks at co-ratings
>> from a given user.
>>
>>
>> On 27.11.2013 15:02, Amit Nithian wrote:
>>> Thanks Sebastian! Is there a particular reason for that?
>>> On Nov 27, 2013 7:47 AM, "Sebastian Schelter" <ss...@googlemail.com>
>>> wrote:
>>>
>>>> Hi Amit,
>>>>
>>>> You are right, the non-corated items are not filtered out in the
>>>> distributed implementation.
>>>>
>>>> --sebastian
>>>>
>>>>
>>>> On 26.11.2013 20:51, Amit Nithian wrote:
>>>>> Hi all,
>>>>>
>>>>> Apologies if this is a repeat question as I just joined the list but I
>>>> have
>>>>> a question about the way that metrics like Cosine and Pearson are
>>>>> calculated in Hadoop "mode" (i.e. non Taste).
>>>>>
>>>>> As far as I understand, the vectors used for computing pairwise item
>>>>> similarity in Taste are based on the co-rated items; however, in the
>>>> Hadoop
>>>>> implementation, I don't see this done.
>>>>>
>>>>> The implementation of the distributed item-item similarity comes from
>>>> this
>>>>> paper http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf. I
>>>> didn't
>>>>> see anything in this paper about filtering out those elements from the
>>>>> vectors not co-rated and this can make a difference especially when you
>>>>> normalize the ratings by dividing by the average item rating. In some
>>>>> cases, the # users to divide by can be fewer depending on the
>> sparseness
>>>> of
>>>>> the vector.
>>>>>
>>>>> Any clarity on this would be helpful.
>>>>>
>>>>> Thanks!
>>>>> Amit
>>>>>
>>>>
>>>>
>>>
>>
>>
> 


Re: Question about Pearson Correlation in non-Taste mode

Posted by Amit Nithian <an...@gmail.com>.
Comparing this against the non distributed (taste) gives different answers
for item item similarity as of course the non distributed looks only at
corated items. I was more wondering if this difference in practice mattered
or not.

Also I'm confused on how you can compute the Pearson similarity between two
vectors of different length which essentially is going on here I think?

Thanks again
Amit
On Nov 27, 2013 9:06 AM, "Sebastian Schelter" <ss...@googlemail.com>
wrote:

> Yes, it is due to the parallel algorithm which only looks at co-ratings
> from a given user.
>
>
> On 27.11.2013 15:02, Amit Nithian wrote:
> > Thanks Sebastian! Is there a particular reason for that?
> > On Nov 27, 2013 7:47 AM, "Sebastian Schelter" <ss...@googlemail.com>
> > wrote:
> >
> >> Hi Amit,
> >>
> >> You are right, the non-corated items are not filtered out in the
> >> distributed implementation.
> >>
> >> --sebastian
> >>
> >>
> >> On 26.11.2013 20:51, Amit Nithian wrote:
> >>> Hi all,
> >>>
> >>> Apologies if this is a repeat question as I just joined the list but I
> >> have
> >>> a question about the way that metrics like Cosine and Pearson are
> >>> calculated in Hadoop "mode" (i.e. non Taste).
> >>>
> >>> As far as I understand, the vectors used for computing pairwise item
> >>> similarity in Taste are based on the co-rated items; however, in the
> >> Hadoop
> >>> implementation, I don't see this done.
> >>>
> >>> The implementation of the distributed item-item similarity comes from
> >> this
> >>> paper http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf. I
> >> didn't
> >>> see anything in this paper about filtering out those elements from the
> >>> vectors not co-rated and this can make a difference especially when you
> >>> normalize the ratings by dividing by the average item rating. In some
> >>> cases, the # users to divide by can be fewer depending on the
> sparseness
> >> of
> >>> the vector.
> >>>
> >>> Any clarity on this would be helpful.
> >>>
> >>> Thanks!
> >>> Amit
> >>>
> >>
> >>
> >
>
>

Re: Question about Pearson Correlation in non-Taste mode

Posted by Sebastian Schelter <ss...@googlemail.com>.
Yes, it is due to the parallel algorithm which only looks at co-ratings
from a given user.


On 27.11.2013 15:02, Amit Nithian wrote:
> Thanks Sebastian! Is there a particular reason for that?
> On Nov 27, 2013 7:47 AM, "Sebastian Schelter" <ss...@googlemail.com>
> wrote:
> 
>> Hi Amit,
>>
>> You are right, the non-corated items are not filtered out in the
>> distributed implementation.
>>
>> --sebastian
>>
>>
>> On 26.11.2013 20:51, Amit Nithian wrote:
>>> Hi all,
>>>
>>> Apologies if this is a repeat question as I just joined the list but I
>> have
>>> a question about the way that metrics like Cosine and Pearson are
>>> calculated in Hadoop "mode" (i.e. non Taste).
>>>
>>> As far as I understand, the vectors used for computing pairwise item
>>> similarity in Taste are based on the co-rated items; however, in the
>> Hadoop
>>> implementation, I don't see this done.
>>>
>>> The implementation of the distributed item-item similarity comes from
>> this
>>> paper http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf. I
>> didn't
>>> see anything in this paper about filtering out those elements from the
>>> vectors not co-rated and this can make a difference especially when you
>>> normalize the ratings by dividing by the average item rating. In some
>>> cases, the # users to divide by can be fewer depending on the sparseness
>> of
>>> the vector.
>>>
>>> Any clarity on this would be helpful.
>>>
>>> Thanks!
>>> Amit
>>>
>>
>>
> 


Re: Question about Pearson Correlation in non-Taste mode

Posted by Amit Nithian <an...@gmail.com>.
Thanks Sebastian! Is there a particular reason for that?
On Nov 27, 2013 7:47 AM, "Sebastian Schelter" <ss...@googlemail.com>
wrote:

> Hi Amit,
>
> You are right, the non-corated items are not filtered out in the
> distributed implementation.
>
> --sebastian
>
>
> On 26.11.2013 20:51, Amit Nithian wrote:
> > Hi all,
> >
> > Apologies if this is a repeat question as I just joined the list but I
> have
> > a question about the way that metrics like Cosine and Pearson are
> > calculated in Hadoop "mode" (i.e. non Taste).
> >
> > As far as I understand, the vectors used for computing pairwise item
> > similarity in Taste are based on the co-rated items; however, in the
> Hadoop
> > implementation, I don't see this done.
> >
> > The implementation of the distributed item-item similarity comes from
> this
> > paper http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf. I
> didn't
> > see anything in this paper about filtering out those elements from the
> > vectors not co-rated and this can make a difference especially when you
> > normalize the ratings by dividing by the average item rating. In some
> > cases, the # users to divide by can be fewer depending on the sparseness
> of
> > the vector.
> >
> > Any clarity on this would be helpful.
> >
> > Thanks!
> > Amit
> >
>
>

Re: Question about Pearson Correlation in non-Taste mode

Posted by Sebastian Schelter <ss...@googlemail.com>.
Hi Amit,

You are right, the non-corated items are not filtered out in the
distributed implementation.

--sebastian


On 26.11.2013 20:51, Amit Nithian wrote:
> Hi all,
> 
> Apologies if this is a repeat question as I just joined the list but I have
> a question about the way that metrics like Cosine and Pearson are
> calculated in Hadoop "mode" (i.e. non Taste).
> 
> As far as I understand, the vectors used for computing pairwise item
> similarity in Taste are based on the co-rated items; however, in the Hadoop
> implementation, I don't see this done.
> 
> The implementation of the distributed item-item similarity comes from this
> paper http://ssc.io/wp-content/uploads/2012/06/rec11-schelter.pdf. I didn't
> see anything in this paper about filtering out those elements from the
> vectors not co-rated and this can make a difference especially when you
> normalize the ratings by dividing by the average item rating. In some
> cases, the # users to divide by can be fewer depending on the sparseness of
> the vector.
> 
> Any clarity on this would be helpful.
> 
> Thanks!
> Amit
>