You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Sean Owen <sr...@gmail.com> on 2009/12/03 16:02:33 UTC

Re: Taste-GenericItemBasedRecommender

I'm going ahead with implementing something like this as a simple but
real distributed implementation.

One question that maybe Ted already knows the answer to: how would I
iterate over A_u* for each A_u*? the mapper would touch only one each.
I guess I could use a whole MR step to generate the cartesian product
so to speak.

On Thu, Sep 10, 2009 at 1:44 AM, Ted Dunning <te...@gmail.com> wrote:
> Another implementation is to iterate over users emitting cooccurrence
> pairs.  These can be sorted on the items involved and the counts
> aggregated.  This is ideal for map reduce implementation.
>
> Roughly the matrix approach becomes this:
>
>  Map:
>    input record is A_u*, all items for a particular user u
>    for each item i in A_u*
>       yield ((
>       yield ((i,*), 1)
>       for each item j in A_u*
>          yield ((i, j), 1)
>
>   Reduce:
>      input record is (i,j), (k1, k2, ... kn)
>      yield (i,j), sum k*
>
>          or
>
>       input record is (i,*), (k1, k2, ... ,kn)
>       yield (i,*) sum k*
>
> This computes the elements of A'A (item cooccurrence counts) and the row
> sums of A'A in one step.   You might want to compute the column sums of A
> first to use these for weighting A before using this program.
>
> Then you need another MR step to bring together the overall total, the row
> counts and the cell counts for final reduction.
>
> This is inside out compared with the most obvious implementation and there
> is no place where we really have two entire item vectors.  This will make it
> rather harder to have a totally simple pluggable weighting scheme that runs
> as fast.

Re: Taste-GenericItemBasedRecommender

Posted by Sean Owen <sr...@gmail.com>.

Yes, if I understand your meaning. Both steps will be computed in
Hadoop. The second step involves loading one row of A'A at a time, and
dotting it with the user vector to get a recommendation score for one
item.

On Fri, Dec 4, 2009 at 9:31 AM, Gökhan Çapan <gk...@gmail.com> wrote:
> Ok, so we will implement and item based recommender;
> - mapred jobs for constructing A'A first.
> - the online part of algorithm.((A'A)h).
>
> What do you think about my comments on the online step? Using a small subset
> of A'A

Re: Taste-GenericItemBasedRecommender

Posted by Gökhan Çapan <gk...@gmail.com>.

Ok, so we will implement and item based recommender;
- mapred jobs for constructing A'A first.
- the online part of algorithm.((A'A)h).

What do you think about my comments on the online step? Using a small subset
of A'A

On Fri, Dec 4, 2009 at 11:19 AM, Sean Owen <sr...@gmail.com> wrote:

> Sure, I'm implementing this today, just a simple version of what's
> been discussed. What would be call this? It's basically item-based
> recommendation.
>
> On Fri, Dec 4, 2009 at 8:57 AM, Gökhan Çapan <gk...@gmail.com> wrote:
> > ith row of the matrix A'A contains all items and their similarity degrees
> to
> > the item that is represented at ith column of the matrix A.
> > I guess it is enough using only a subset of A'A at the final step, that
> is,
> > the rows which represent the items that are in active user's history.
> > btw, I also want to contribute to that implementation, if we can decide
> the
> > algorithm.
>



-- 
Gökhan Çapan

Re: Taste-GenericItemBasedRecommender

Posted by Sean Owen <sr...@gmail.com>.

Sure, I'm implementing this today, just a simple version of what's
been discussed. What would be call this? It's basically item-based
recommendation.

On Fri, Dec 4, 2009 at 8:57 AM, Gökhan Çapan <gk...@gmail.com> wrote:
> ith row of the matrix A'A contains all items and their similarity degrees to
> the item that is represented at ith column of the matrix A.
> I guess it is enough using only a subset of A'A at the final step, that is,
> the rows which represent the items that are in active user's history.
> btw, I also want to contribute to that implementation, if we can decide the
> algorithm.

Re: Taste-GenericItemBasedRecommender

Posted by Gökhan Çapan <gk...@gmail.com>.

Yeah, you are right. I also think it must be the online part of the
algorithm.

On Fri, Dec 4, 2009 at 9:45 PM, Ted Dunning <te...@gmail.com> wrote:

> You are correct that only a little bit of A'A is used for any given user's
> recommendation, but it is the columns that correspond to their history, not
> the rows.  This doesn't matter much to get this wrong if A'A is still
> symmetrical, but getting it right can make your code less confusing for
> others and if you limit the number of non-zero elements in each row of A'A,
> then it becomes important to keep track of which is which.
>
> On Fri, Dec 4, 2009 at 12:57 AM, Gökhan Çapan <gk...@gmail.com> wrote:
>
> > ith row of the matrix A'A contains all items and their similarity degrees
> > to
> > the item that is represented at ith column of the matrix A.
> > I guess it is enough using only a subset of A'A at the final step, that
> is,
> > the rows which represent the items that are in active user's history.
> > btw, I also want to contribute to that implementation, if we can decide
> the
> > algorithm.
> >
> > --
> Ted Dunning, CTO
> DeepDyve
>



-- 
Gökhan Çapan

Re: Taste-GenericItemBasedRecommender

Posted by Ted Dunning <te...@gmail.com>.

You are correct that only a little bit of A'A is used for any given user's
recommendation, but it is the columns that correspond to their history, not
the rows.  This doesn't matter much to get this wrong if A'A is still
symmetrical, but getting it right can make your code less confusing for
others and if you limit the number of non-zero elements in each row of A'A,
then it becomes important to keep track of which is which.

On Fri, Dec 4, 2009 at 12:57 AM, Gökhan Çapan <gk...@gmail.com> wrote:

> ith row of the matrix A'A contains all items and their similarity degrees
> to
> the item that is represented at ith column of the matrix A.
> I guess it is enough using only a subset of A'A at the final step, that is,
> the rows which represent the items that are in active user's history.
> btw, I also want to contribute to that implementation, if we can decide the
> algorithm.
>
> --
Ted Dunning, CTO
DeepDyve

Re: Taste-GenericItemBasedRecommender

Posted by Gökhan Çapan <gk...@gmail.com>.

ith row of the matrix A'A contains all items and their similarity degrees to
the item that is represented at ith column of the matrix A.
I guess it is enough using only a subset of A'A at the final step, that is,
the rows which represent the items that are in active user's history.
btw, I also want to contribute to that implementation, if we can decide the
algorithm.

On Fri, Dec 4, 2009 at 10:33 AM, Sean Owen <sr...@gmail.com> wrote:

> Yes, this makes sense. I do need two passes. One pass converts input
> from "user,item,rating" triples into user vectors. Then the second
> step builds the co-occurrence A'A product. I agree that it will be
> faster to take a shortcut than properly compute A'A.
>
> (Though I'm curious how this works -- looks deceptively easy, this
> outer product approach. Isn't v cross v potentially huge? or likely to
> be sparse enough to not matter)
>
> I understand the final step in principle, which is to compute (A'A)h.
> But I keep guessing A'A is too big to fit in memory? So I can
> side-load the rows of A'A one at a time and compute it rather
> manually.
>
>
> On Thu, Dec 3, 2009 at 8:28 PM, Ted Dunning <te...@gmail.com> wrote:
> > I think you can merge my passes into a single pass in which you compute
> the
> > row and column sums at the same time that you compute the product.  That
> is
> > more complicated, though, and I hate fancy code.  So you are right in
> > practice that I have always had two passes.  (although pig might be
> clever
> > enough by now to merge them)
> >
> > There is another pass in which you use all of the sums to do the
> > sparsification.  I don't know if that could be done in the same pass or
> not.
>



-- 
Gökhan Çapan

Re: Taste-GenericItemBasedRecommender

Posted by Ted Dunning <te...@gmail.com>.

This product needs to happen at recommendation time.

And it is usually better to think of it in transposed form as the sum of a
few columns selected by the non-zero elements of h.  That way you retrieve
just a few columns.

On Fri, Dec 4, 2009 at 10:59 AM, Sean Owen <sr...@gmail.com> wrote:

> On Fri, Dec 4, 2009 at 6:04 PM, Jake Mannix <ja...@gmail.com> wrote:
> > How would you do this?  Take the rows of v_i cross v_i and add them
> > up?  Isn't that another MR job?
>
> No I suppose I meant take each row of A'A, one at a time, and dot with
> h. That's computing the recommendation vector with only one row in
> memory at a time.
>

-- 
Ted Dunning, CTO
DeepDyve

Re: Taste-GenericItemBasedRecommender

Posted by Sean Owen <sr...@gmail.com>.

On Fri, Dec 4, 2009 at 6:04 PM, Jake Mannix <ja...@gmail.com> wrote:
> How would you do this?  Take the rows of v_i cross v_i and add them
> up?  Isn't that another MR job?

No I suppose I meant take each row of A'A, one at a time, and dot with
h. That's computing the recommendation vector with only one row in
memory at a time.

Re: Taste-GenericItemBasedRecommender

Posted by Ted Dunning <te...@gmail.com>.

If you down-sample during conversion from triples then you never even need
to keep an entire row in memory.  After down-sampling it won't matter.
Moreover, in the actual multiplication, you only pass around individual
summand elements of (A'A) so there is little memory required there as well.
Most of the excess space by emitting all those single elements is removed by
the combiner.  The reducer will also remove elements through
sparsification.  The result is less sparse than the original data, but is
still very sparse.

In the final form of (A'A) it may even be desirable to limit the number of
non-zero elements in a row (which breaks symmetry without much harm).  You
generally need another MR step to do this, but the effect on final
recommendation run-time cost is significant and the effect on recommendation
quality is nil.  This has the effect (in the Lucene form of the recommender)
of limiting the number of terms in each recommendation document.

On Fri, Dec 4, 2009 at 10:04 AM, Jake Mannix <ja...@gmail.com> wrote:

>  sum_i  (v_i cross v_i)
>
> is indeed going to get pretty dense, so the reducer may totally blow
> up if care is not taken - because it's all being done in memory in the
> reducer (ie yes, all of A'A lives in memory just before the reducer
> completes, in my naive impl).
>

-- 
Ted Dunning, CTO
DeepDyve

Re: Taste-GenericItemBasedRecommender

Posted by Jake Mannix <ja...@gmail.com>.

On Fri, Dec 4, 2009 at 12:33 AM, Sean Owen <sr...@gmail.com> wrote:

>
> (Though I'm curious how this works -- looks deceptively easy, this
> outer product approach. Isn't v cross v potentially huge? or likely to
> be sparse enough to not matter)
>

v cross v is very huge, but very sparse... *but*

  sum_i  (v_i cross v_i)

is indeed going to get pretty dense, so the reducer may totally blow
up if care is not taken - because it's all being done in memory in the
reducer (ie yes, all of A'A lives in memory just before the reducer
completes, in my naive impl).

> I understand the final step in principle, which is to compute (A'A)h.
> But I keep guessing A'A is too big to fit in memory? So I can
> side-load the rows of A'A one at a time and compute it rather
> manually.
>

How would you do this?  Take the rows of v_i cross v_i and add them
up?  Isn't that another MR job?

  -jake

Re: Taste-GenericItemBasedRecommender

Posted by Sean Owen <sr...@gmail.com>.

On Sun, Dec 13, 2009 at 3:24 AM, Jake Mannix <ja...@gmail.com> wrote:
>  You do the co-occurrence matrix (for item-by-item, right?) on Hadoop too,
> and that part is really fast, but computing the recommendations is very
> slow?  By what orders of magnitude, for the whole set?
>
> What are the scales you are testing with, in terms of total number of users,
> items, and ratings?

Yes, for about 10M ratings (tens of thousands of users and items) the
co-occurrence matrix counts take a couple minutes, and then recs are
on track to take a day or two.

Re: Taste-GenericItemBasedRecommender

Posted by Jake Mannix <ja...@gmail.com>.

On Sat, Dec 12, 2009 at 3:16 PM, Sean Owen <sr...@gmail.com> wrote:


Recommendations are computed for one user at a time, by multiplying
the co-occurrence matrix by the user preference vector. And then yes
it's one big job invoking computation for all users.

I'm running this all one one machine (my laptop) so it's kind of
serialized anyway. yes it was 10 seconds to compute all recs for one
user; it's a couple secs now with some more work. That's still rough
but not awful.

 So when doing a big batch of a thousand users, say, you're saying it's
taking your laptop 3 hours to do this using the Hadoop-based code (in
pseudo-distributed mode)?


All of it is on Hadoop here. It's pretty simple -- make the user

vectors, make the co-occurrence matrix (all that is quite fast), then
multiply the two to make recommendations.

 You do the co-occurrence matrix (for item-by-item, right?) on Hadoop too,
and that part is really fast, but computing the recommendations is very
slow?  By what orders of magnitude, for the whole set?

What are the scales you are testing with, in terms of total number of users,
items, and ratings?

  -jake

Re: Taste-GenericItemBasedRecommender

Posted by Sean Owen <sr...@gmail.com>.

On Sat, Dec 12, 2009 at 11:08 PM, Jake Mannix <ja...@gmail.com> wrote:
> You're not computing only one recommendation at a time, are you?
> I really need to read through the hadoop.item code, but in general, what
> is the procedure here?  If you're doing work on HDFS as a M/R job, you're
> doing a huge batch, right?  You're saying the aggregate performance is
> 10 seconds per recomendation across millions of recommendations, or
> doing a one-shot task?

Recommendations are computed for one user at a time, by multiplying
the co-occurrence matrix by the user preference vector. And then yes
it's one big job invoking computation for all users.

I'm running this all one one machine (my laptop) so it's kind of
serialized anyway. yes it was 10 seconds to compute all recs for one
user; it's a couple secs now with some more work. That's still rough
but not awful.

> offline).  Can you give a quick review of which part of this is supposed
> to be on Hadoop, which parts are done live, a kind of big picture
> description of what's going on?

All of it is on Hadoop here. It's pretty simple -- make the user
vectors, make the co-occurrence matrix (all that is quite fast), then
multiply the two to make recommendations.

Re: Taste-GenericItemBasedRecommender

Posted by Jake Mannix <ja...@gmail.com>.

On Sat, Dec 12, 2009 at 8:58 AM, Sean Owen <sr...@gmail.com> wrote:

> I've implemented this but it's still quite slow. Computing
> recommendations goes from a couple hundred ms to 10 seconds. Nothing
> wrong with this idea -- it's all the loading vectors and distributed
> stuff that's weighing it down.
>

You're not computing only one recommendation at a time, are you?
I really need to read through the hadoop.item code, but in general, what
is the procedure here?  If you're doing work on HDFS as a M/R job, you're
doing a huge batch, right?  You're saying the aggregate performance is
10 seconds per recomendation across millions of recommendations, or
doing a one-shot task?

I feel like too much of this conversation went by and I missed some
crucial piece describing the taks in a big-picture sense (and this
notion is backed up by the fact that we keep talking past each other
when it comes to which parts of this process are online and which are
offline).  Can you give a quick review of which part of this is supposed
to be on Hadoop, which parts are done live, a kind of big picture
description of what's going on?

I think that's the culprit in fact, having to load all the column
> vectors, since they're not light.
>
> One approach is to make the user vectors more sparse by throwing out
> data, though I don't like it so much.
>
> One question -- in SparseVector, can't we internally remove entries
> when they are set to 0.0? since implicitly missing entries are 0?
>

We should certainly add a "compact" method to both versions of
SparseVector, which could be periodically called to remove out any
zeroes and save on subsequent computational costs.

  -jake

Re: Taste-GenericItemBasedRecommender

Posted by Ted Dunning <te...@gmail.com>.

(redirected from mahout-user)

On Sun, Dec 13, 2009 at 1:25 PM, Jake Mannix <ja...@gmail.com> wrote:

> This is trickier with assign(Vector v, BinaryFunction f), because you have
> no
> way of testing if f.apply(x, 0) == x for all x, and here a marker interface
> might
> be necessary.  I'd also be fine with just checking that the function is one
> of
> Plus or PlusWithScale, because that's a ginormous chunk of the use case.
>

In fact, my own use case just turned out not to be a use case because I had
several things to do and several vectors to do it to.  That meant that it
was better for me to do my own iteratorNonZero.

The number of functions that have a zero unit and are not addition or scaled
addition probably is quite small (averaged over real uses).  I think you are
right that we don't need additional machinery at this time.

-- 
Ted Dunning, CTO
DeepDyve

Re: Taste-GenericItemBasedRecommender

Posted by Ted Dunning <te...@gmail.com>.

Duh.  Simplest approach wins!

On Sun, Dec 13, 2009 at 1:25 PM, Jake Mannix <ja...@gmail.com> wrote:

> > Following up on Jake's comment, could we have a marker interface that
> > indicates a function maps 0 to 0?  The abstract vector and matrix
> > implementations could use instanceOf to decide what to do.
> >
>
> I tried this when I was still trying to contribute to commons-math, but
> realized
> after several patch iterations that it's actually far easier to just do
> this:
>
>    Iterator<Element> e = (f.apply(0) == 0) ? iterateNonZero() :
> iterateAll();
>



-- 
Ted Dunning, CTO
DeepDyve

Re: Taste-GenericItemBasedRecommender

Posted by Jake Mannix <ja...@gmail.com>.

On Sun, Dec 13, 2009 at 11:02 AM, Ted Dunning <te...@gmail.com> wrote:

> The issue is when adding a sparse vector to a dense vector (with the dense
> vector being mutated) that the dense vector doesn't know how to use the
> sparse iterator.
>

What do you mean? AbstractVector.plus already knows this in trunk - it
always calls
iterateNonZero() on the other vector, as well it should, for this
operation.  The problem
with plus() is that it makes a copy, and so this is bad for doing
accumulation.

For accumulation, you want to do assign(Vector v, BinaryFunction f), and
this needs
to be done intelligently.

> Following up on Jake's comment, could we have a marker interface that
> indicates a function maps 0 to 0?  The abstract vector and matrix
> implementations could use instanceOf to decide what to do.
>

I tried this when I was still trying to contribute to commons-math, but
realized
after several patch iterations that it's actually far easier to just do
this:

    Iterator<Element> e = (f.apply(0) == 0) ? iterateNonZero() :
iterateAll();

Even if the method call is expensive, it's only done one extra time.

This is trickier with assign(Vector v, BinaryFunction f), because you have
no
way of testing if f.apply(x, 0) == x for all x, and here a marker interface
might
be necessary.  I'd also be fine with just checking that the function is one
of
Plus or PlusWithScale, because that's a ginormous chunk of the use case.

If we had these cases (or the marker interface which covered it in general),
then yes, assign(Vector v, BinaryFunction f) could be implemented by
checking
which argument of f it was zero preserving w.r.t., and then choosing
iterators
appropriately.  Some of this logic could live in subclasses (ie. not
AbstractVector)
because for example, SparseVectors based on open-map don't iterate in order,
but do have nice random access, while IntDoublePair vectors have their own
specialties.

But this is getting far-afield - these are important optimizations, but I
have a
feeling this isn't what is causing Sean's slowness.  There's something else
I haven't been able to put my finger on...

Maybe we should continue this on mahout-dev?  We're not in user-land
anymore. :)

  -jake

Re: Taste-GenericItemBasedRecommender

Posted by Ted Dunning <te...@gmail.com>.

The issue is when adding a sparse vector to a dense vector (with the dense
vector being mutated) that the dense vector doesn't know how to use the
sparse iterator.

Following up on Jake's comment, could we have a marker interface that
indicates a function maps 0 to 0?  The abstract vector and matrix
implementations could use instanceOf to decide what to do.

On Sun, Dec 13, 2009 at 5:12 AM, Sean Owen <sr...@gmail.com> wrote:

> In my particular case, I just need "Scale" instead of "PlusWithScale",
> and that can take advantage of sparseness.
>
> My (er, Ted's) current approach is to sum SparseVectors. This takes
> advantage of sparseness already.
>
> Am I missing why a Scale/PlusWithScale implementation, when using
> sparseness, would be notably faster?
>
>

Re: Taste-GenericItemBasedRecommender

Posted by Sean Owen <sr...@gmail.com>.

My indices are all over the range 0..Integer.MAX_VALUE since they are
hashed from the actual long item IDs, so I imagine a DenseVector is a
non-starter.

Yeah would appreciate a second set of eyes on it. It's in a state
where it's worth looking at. At least I felt it was complete enough to
finish writing up in a chapter.

On Sun, Dec 13, 2009 at 5:58 PM, Jake Mannix <ja...@gmail.com> wrote:
> It looks like your results should probably be fairly dense in the end,
> right?  So you should be accumulating SparseVectors on a DenseVector output,
> at least...  but this should only be a small speedup, not a lower
> complexity... as you say, you're already taking advantage of sparseness in
> the sums...
>
> I need to dig in and try this myself tonight or tomorrow night...

Re: Taste-GenericItemBasedRecommender

Posted by Jake Mannix <ja...@gmail.com>.

It looks like your results should probably be fairly dense in the end,
right?  So you should be accumulating SparseVectors on a DenseVector output,
at least...  but this should only be a small speedup, not a lower
complexity... as you say, you're already taking advantage of sparseness in
the sums...

I need to dig in and try this myself tonight or tomorrow night...

On Dec 13, 2009 5:12 AM, "Sean Owen" <sr...@gmail.com> wrote:

In my particular case, I just need "Scale" instead of "PlusWithScale",
and that can take advantage of sparseness.

My (er, Ted's) current approach is to sum SparseVectors. This takes
advantage of sparseness already.

Am I missing why a Scale/PlusWithScale implementation, when using
sparseness, would be notably faster?

On Sun, Dec 13, 2009 at 11:11 AM, Jake Mannix <ja...@gmail.com> wrote:
> On Sat, Dec 12, 2009...

Re: Taste-GenericItemBasedRecommender

Posted by Sean Owen <sr...@gmail.com>.

In my particular case, I just need "Scale" instead of "PlusWithScale",
and that can take advantage of sparseness.

My (er, Ted's) current approach is to sum SparseVectors. This takes
advantage of sparseness already.

Am I missing why a Scale/PlusWithScale implementation, when using
sparseness, would be notably faster?

On Sun, Dec 13, 2009 at 11:11 AM, Jake Mannix <ja...@gmail.com> wrote:
> On Sat, Dec 12, 2009 at 5:34 PM, Ted Dunning <te...@gmail.com> wrote:
>
>> This is a key problem.
>>
>> Looks like we really need to think about versions of assign that only scan
>> non-zero elements.  Something like assignFromNonZero.
>>
>
> This is easier than the rest of the stuff we are talking about here.
> Whenever
> you have a UnaryFunction f such that f.apply(0) == 0, or you have a
> BinaryFunction b such that b.appy(x, 0) == x or b.apply(0, x) == x, then
> assign
> should take advantage of this and iterateNonZero() on the vector which can
> be skipped over.  In Colt, they actually special-case out functions like
> Plus
> or Minus to take care of this.
>
> In our case, a particular, PlusWithScale is a very very common function
> for Vectors to do, and could be special cased itself, in AbstractVector.
>
>  -jake
>

Re: Taste-GenericItemBasedRecommender

Posted by Jake Mannix <ja...@gmail.com>.

On Sat, Dec 12, 2009 at 5:34 PM, Ted Dunning <te...@gmail.com> wrote:

> This is a key problem.
>
> Looks like we really need to think about versions of assign that only scan
> non-zero elements.  Something like assignFromNonZero.
>

This is easier than the rest of the stuff we are talking about here.
Whenever
you have a UnaryFunction f such that f.apply(0) == 0, or you have a
BinaryFunction b such that b.appy(x, 0) == x or b.apply(0, x) == x, then
assign
should take advantage of this and iterateNonZero() on the vector which can
be skipped over.  In Colt, they actually special-case out functions like
Plus
or Minus to take care of this.

In our case, a particular, PlusWithScale is a very very common function
for Vectors to do, and could be special cased itself, in AbstractVector.

  -jake

Re: Taste-GenericItemBasedRecommender

Posted by Ted Dunning <te...@gmail.com>.

This is a key problem.

Looks like we really need to think about versions of assign that only scan
non-zero elements.  Something like assignFromNonZero.

What is the sense of how that should look?

On Sat, Dec 12, 2009 at 2:54 PM, Sean Owen <sr...@gmail.com> wrote:

> Thanks, it makes sense, though in practice this didn't perform so
> well. assign() can't take advantage of sparseness.
>

-- 
Ted Dunning, CTO
DeepDyve

Re: Taste-GenericItemBasedRecommender

Posted by Sean Owen <sr...@gmail.com>.

Thanks, it makes sense, though in practice this didn't perform so
well. assign() can't take advantage of sparseness.

After throwing in some caching and other improvements performance is,
well, bad in my book but not awful. It's perhaps the price of the
distributed nature of this computation -- or else I'm doing something
wrong.

I think it's working enough for my purposes at the moment.

On Sat, Dec 12, 2009 at 7:48 PM, Ted Dunning <te...@gmail.com> wrote:
> If the vector matrix product is done like this:
>
>      Vector w = v.like();
>      MultiplyAdd scale = new MultiplyAdd();
>      while (v.iterateNonZero().hasNext()) {
>        Vector.Element element = v.iterateNonZero().next();
>        scale.setScale(element.get());
>        w.assign(getColumn(element.index()), scale);
>      }
>

Re: Taste-GenericItemBasedRecommender

Posted by Ted Dunning <te...@gmail.com>.

On Sat, Dec 12, 2009 at 8:58 AM, Sean Owen <sr...@gmail.com> wrote:

> ...
> I think that's the culprit in fact, having to load all the column
> vectors, since they're not light.
>

If the vector matrix product is done like this:

      Vector w = v.like();
      MultiplyAdd scale = new MultiplyAdd();
      while (v.iterateNonZero().hasNext()) {
        Vector.Element element = v.iterateNonZero().next();
        scale.setScale(element.get());
        w.assign(getColumn(element.index()), scale);
      }

Then you might have a better speed, especially since you can lazy load just
the columns you want.  Google collections has an interesting dynamic map
builder that would be useful for this.

>
> One approach is to make the user vectors more sparse by throwing out
> data, though I don't like it so much.
>

This is often useful actually, but more in the sense of only retaining
recent events than sparsification.  If you get lots of data per user, then
this isn't much of a problem.  If you use rarer data, then you may have more
of an issue (ratings are the prime example).

>
> One question -- in SparseVector, can't we internally remove entries
> when they are set to 0.0? since implicitly missing entries are 0?
>
>
Absolutely.  Depending on representation, deletion of elements may not help
so much unless subsequent elements are added that fill in the holes.

-- 
Ted Dunning, CTO
DeepDyve

Re: Taste-GenericItemBasedRecommender

Posted by Sean Owen <sr...@gmail.com>.

I've implemented this but it's still quite slow. Computing
recommendations goes from a couple hundred ms to 10 seconds. Nothing
wrong with this idea -- it's all the loading vectors and distributed
stuff that's weighing it down.

I think that's the culprit in fact, having to load all the column
vectors, since they're not light.

One approach is to make the user vectors more sparse by throwing out
data, though I don't like it so much.

One question -- in SparseVector, can't we internally remove entries
when they are set to 0.0? since implicitly missing entries are 0?

On Sat, Dec 5, 2009 at 10:27 PM, Ted Dunning <te...@gmail.com> wrote:
> The first approach requires access to all rows of M, but does the dot
> products using the sparse pattern of v.  The second approach inverts the
> nesting of the loops and moves the use of sparsity of v to the outer loop.
> This allows us to access only a small number of columns of M which is a huge
> win even if A is entirely in memory.  Secondarily, it also effectively lifts
> the highly repeated determination of the sparsity pattern of v out of the
> inner loop resulting in additional savings.
>

Re: Taste-GenericItemBasedRecommender

Posted by Jake Mannix <ja...@gmail.com>.

On Sat, Dec 5, 2009 at 5:17 PM, Sean Owen <sr...@gmail.com> wrote:

> I suggest for purposes of the project we would build implementations
> of Recommender that can consume some output from Hadoop on HDFS, like
> a SequenceFile or whatever it's called. Shouldn't be hard at all. This
> sort of hybrid approach is already what happens with slope-one -- I
> wrote some jobs to build its diffs and then you can load the output
> into SlopeOneRecommender -- which works online from there.
>

Something generic like this would be helpful, I think, as well as outputting
to a Lucene index.

I wonder how important these implementations are for the project,
> which seems like a bit of heresy -- surely Mahout needs to support
> recommendation on huge amounts of data? I think the answer's yes, but:
>
> LinkedIn and Netflix and Apple and most organizations with huge data
> to recommend from have already developed sophisticated, customized
> solutions.
>

Actually, from direct experience and conversations with principals involved,
I can tell you that you would be surprised at the unsophistication some
parts of the production systems at all three of these places (as well as,
eg.
Amazon).

Mahout could end up becoming large parts of some of their infrastructure
for doing this at some point.

Organizations with less than 100M data points or so to process don't
> need distributed architectures. They can use Mahout as-is with its
> online non-distributed recommenders pretty well. 10 lines of code and
> one big server and a day of tinkering and they have a full-on simple
> recommender engine, online or offline. And I argue that this is about
> 90% of users of the project who want recommendations.
>

Today, yes.  In a year, this number will maybe be 80%.  In 2 years - maybe
60%.  Big data is coming to smaller and smaller organizations.
Usage data doesn't need to be only internal: stuff you mine off of the web
can be used too...

> So who are these organizations that have enough data (like 1B+ data
> points) that they need something like the rocket science that LinkedIn
> needs, but can't or haven't developed such capability already
> in-house?
>
> I guess that's why I've been reluctant to engineer and complicate the
> framework to fit in offline distributed recommendation -- because this
> can become as complex as we like -- since I wonder at the 'market' for
> it. But it seems inevitable that this must exist, even if just as a
> nice clean simple reference implementation of the idea. Perhaps I
> won't go overboard on designing something complex yet here at the
> moment.
>

All of the above which I said aside: I agree that over-engineering
something to do this is not desireable.  But thinking about how we
output partially processed "matrices" for on-line recommendation
generation is something we should still do.  Maybe we're adequately
served by spitting out SequenceFiles, with a simple api for zipping
through them and producing scores using pluggable scoring functions?

  -jake

Re: Taste-GenericItemBasedRecommender

Posted by Ted Dunning <te...@gmail.com>.

The computation of cooccurrence matrices is a more general task than for
just recommendations and I of more interest to me than just
recommendations.  Cooccurrence counting is a key step in building a random
index, for instance.

I will also second Jake's comment about how even large scale organizations
could well use an efficient off-line/on-line recommendation system.  Few of
them have had the luxury of building the system they really wanted
(certainly that was true at Veoh).  By pooling resources, we can build
something of use to all of us.

On Sat, Dec 5, 2009 at 5:17 PM, Sean Owen <sr...@gmail.com> wrote:

> I guess that's why I've been reluctant to engineer and complicate the
> framework to fit in offline distributed recommendation -- because this
> can become as complex as we like -- since I wonder at the 'market' for
> it. But it seems inevitable that this must exist, even if just as a
> nice clean simple reference implementation of the idea. Perhaps I
> won't go overboard on designing something complex yet here at the
> moment.
>

-- 
Ted Dunning, CTO
DeepDyve

Re: Taste-GenericItemBasedRecommender

Posted by Sean Owen <sr...@gmail.com>.

I suggest for purposes of the project we would build implementations
of Recommender that can consume some output from Hadoop on HDFS, like
a SequenceFile or whatever it's called. Shouldn't be hard at all. This
sort of hybrid approach is already what happens with slope-one -- I
wrote some jobs to build its diffs and then you can load the output
into SlopeOneRecommender -- which works online from there.

At least then the "hybrid" offline/online recommenders aren't yet a
third species of recommender in the framework. Perhaps there isn't
even a need for fully offline recommenders? Just jobs that can produce
supporting intermediate output for online recommenders? That'd be
tidier still.

If I may digress --

I wonder how important these implementations are for the project,
which seems like a bit of heresy -- surely Mahout needs to support
recommendation on huge amounts of data? I think the answer's yes, but:

LinkedIn and Netflix and Apple and most organizations with huge data
to recommend from have already developed sophisticated, customized
solutions.

Organizations with less than 100M data points or so to process don't
need distributed architectures. They can use Mahout as-is with its
online non-distributed recommenders pretty well. 10 lines of code and
one big server and a day of tinkering and they have a full-on simple
recommender engine, online or offline. And I argue that this is about
90% of users of the project who want recommendations.

So who are these organizations that have enough data (like 1B+ data
points) that they need something like the rocket science that LinkedIn
needs, but can't or haven't developed such capability already
in-house?

I guess that's why I've been reluctant to engineer and complicate the
framework to fit in offline distributed recommendation -- because this
can become as complex as we like -- since I wonder at the 'market' for
it. But it seems inevitable that this must exist, even if just as a
nice clean simple reference implementation of the idea. Perhaps I
won't go overboard on designing something complex yet here at the
moment.

On Sun, Dec 6, 2009 at 12:43 AM, Jake Mannix <ja...@gmail.com> wrote:
> But having a nice api for *outputting* the precomputed matrices which
> are pretty big into a format where online "queries"/recommendation
> requests can be computed I think is really key here.   We should think
> much more about what makes the most sense here.

Re: Taste-GenericItemBasedRecommender

Posted by Jake Mannix <ja...@gmail.com>.

On Sat, Dec 5, 2009 at 4:27 PM, Sean Owen <sr...@gmail.com> wrote:

The biggest pain for me here is how to rationalize all of this into an
> API. The current code is completely online. Now I'm dropping in a
> truly offline/distributed version, which is a totally different
> ballgame. And then there are all these hybrid approaches, computing
> some stuff offline and some online and requiring real-time integration
> with HDFS.

This is something I've been thinking about a lot too - at LinkedIn we do a
ton of offline Hadoop-based computation for recommendations, but then a
bunch of stuff is/can be done online.  You can do it with Lucene, as Ted
suggests (and in fact that is one of our implementations), or by doing a
lot of precomputation of results and storing them in a key-value store
(in our case, Voldemort).

But having a nice api for *outputting* the precomputed matrices which
are pretty big into a format where online "queries"/recommendation
requests can be computed I think is really key here.   We should think
much more about what makes the most sense here.

  -jake

Re: Taste-GenericItemBasedRecommender

Posted by Sean Owen <sr...@gmail.com>.

On Sat, Dec 5, 2009 at 10:27 PM, Ted Dunning <te...@gmail.com> wrote:
> Post your code and I will add the down-sampling.

Yep, I understand the point. The code is already in SVN under
org.apache.mahout.cf.taste.hadoop.item. It's still in progress --
doesn't work at the moment, but you should see all the major parts.

> Also, I typically discard counts from the user vector because that improves
> resistance to spammers.  If you do that, then the counts you are producing
> will all be 1 (cooccurrence for a single user).  That means that the
> combiner and reducer will still be required.

Yes, I don't even output the 1, just the item -> item mapping. It's
the simplest place to start.

> I may be misunderstanding what you are saying.  You may be saying that you
> will be accumulating all the cooccurrence data for all users in memory.
> That seems like a bad idea given that the combiner/reducer approach is
> pretty darned fast.

No it's never all in memory, but all coocurrence for one item would be
in memory. The reducer would get the item -> item mappings -- really,
and item -> iterator-over-items. And then from that you construct a
row of the cooccurrence matrix. I think it's leveraging the combiner
just fine as this is what's collecting all the cooccurrence for one
item into one place.

> The first approach requires access to all rows of M, but does the dot
> products using the sparse pattern of v.  The second approach inverts the
> nesting of the loops and moves the use of sparsity of v to the outer loop.
> This allows us to access only a small number of columns of M which is a huge
> win even if A is entirely in memory.  Secondarily, it also effectively lifts
> the highly repeated determination of the sparsity pattern of v out of the
> inner loop resulting in additional savings.

Yeah I get it, that does make a good bit of sense. That is a big win.

> I say this because user histories change in real-time and it is common to
> require real-time updates to recommendations.  Since the recommendation is
> so very cheap to compute it seems reasonable to do it at real time rather
> than pay the cost of storing all recommendations for all users until they
> need them.  Moveover, since the recommendation is more dense than either the
> history or the columns of (A'A) and because many common columns of A'A will
> be cached in memory, the I/O cost isn't even all that different.  That means
> that total cost of just retrieving a recommendation is comparable to the
> cost of computing it on the fly.

Yes I get this too, and makes sense as well.

The biggest pain for me here is how to rationalize all of this into an
API. The current code is completely online. Now I'm dropping in a
truly offline/distributed version, which is a totally different
ballgame. And then there are all these hybrid approaches, computing
some stuff offline and some online and requiring real-time integration
with HDFS. I'm trying to incorporate it all in a way that isn't chaos,
since I'm trying to write a chapter on this now as well. Will have to
come up with a theory and a plan to implement it.

Re: Taste-GenericItemBasedRecommender

Posted by Ted Dunning <te...@gmail.com>.

On Sat, Dec 5, 2009 at 1:42 AM, Sean Owen <sr...@gmail.com> wrote:

> On Fri, Dec 4, 2009 at 7:35 PM, Ted Dunning <te...@gmail.com> wrote:
> > The preferable approach is for the first MR step to group by user as
> before,
> > then in the reduce down-sample the user items if desired and output that
> > list in a single record.  Down-sampling can be done on-line keeping just
> the
> > retained elements in memory.  Second MR would produce the cross product
> in
> > the mapper and use a combiner and reducer.
>
> That's what I'm doing -- outputting a Vector per user in the first MR.
> (I'm leaving out the extras like downsampling until the basic approach
> works.)
>

If any users have looked at more than a 1000 items, downsampling becomes
very, very important.  The problem is that the number of non-zero summands
for a user is proportional to the square of the number of items for that
user.  The most active users quickly dominate the total cost.

If your user population follows Zipf's law exactly, this causes the A'A
computation to asymptotically cost quadratic time in the number of users.
In practice, the situation is slightly better because the most active 1% of
all users are not as active as Zipf would predict, but the computation cost
is still substantially super-linear and thus is not scalable even at
relatively moderate numbers of users.

Post your code and I will add the down-sampling.

>
> I think I'm going a different way to produce the cooccurrence matrix -
> no cross product, just counting and outputting all cooccurrence, and
> outputting item1ID -> item2ID as key-value pairs. That makes it tidy
> to produce the rows of the cooccurrence matrix in the reducer.
>

This is essentially what the combiner would be doing.  Your approach may be
a bit faster since the data is moving less.  The merge sorting used by
hadoop may improve locality which could offset your advantage, depending on
how large the cross product is.

Also, I typically discard counts from the user vector because that improves
resistance to spammers.  If you do that, then the counts you are producing
will all be 1 (cooccurrence for a single user).  That means that the
combiner and reducer will still be required.

I may be misunderstanding what you are saying.  You may be saying that you
will be accumulating all the cooccurrence data for all users in memory.
That seems like a bad idea given that the combiner/reducer approach is
pretty darned fast.

> Another approach is to make each column of A'A be stored in a key-value
> > store.  At recommendation time, you retrieve columns and add them.  This
> is
> > essentially equivalent to the Lucene approach without lucene.  Because we
> > know a lot about the contents (they are integers), you can probably write
> > tighter code than Lucene can use.  This would be a great use for the
> fancy
> > concurrent map builder that is in Google collections, for instance.
>
> Sounds cool but don't I need the rows of A'A to multiply against h? h
> is a column vector.
>

You can view matrix vector multiplication in two different ways depending on
how you nest the loops.  One way (what you are talking about) is to say that
the result is a vector of dot products.  This corresponds to this pseudo
code:

      for (i in 1:n) {
          r[i] = A.row(n).dot(v)
     }

In this approach, the inner loop is the dot product and we require row-wise
access to M

The other way to look at it is as the weighted sum of vectors.

      r = zero(n)
      for (j in 1:m) {
          r += v[j] * M.column[j]
      }

Here the inner loop is the addition to the result and we require column-wise
access to M.

This second approach can be rewritten, though,

      r = zero(n)
      for (j in v.nonZeros()) {
           r += v[j] * M.column[j]
      }

The first approach requires access to all rows of M, but does the dot
products using the sparse pattern of v.  The second approach inverts the
nesting of the loops and moves the use of sparsity of v to the outer loop.
This allows us to access only a small number of columns of M which is a huge
win even if A is entirely in memory.  Secondarily, it also effectively lifts
the highly repeated determination of the sparsity pattern of v out of the
inner loop resulting in additional savings.

> Also why did you later say recommendation must occur online? seems
> quite doable offline and my picture of the point of this whole Hadoop
> framework is doing things offline. They've already gone to the trouble
> of running a cluster and have given up doing it entirely online, so...
>

I say this because user histories change in real-time and it is common to
require real-time updates to recommendations.  Since the recommendation is
so very cheap to compute it seems reasonable to do it at real time rather
than pay the cost of storing all recommendations for all users until they
need them.  Moveover, since the recommendation is more dense than either the
history or the columns of (A'A) and because many common columns of A'A will
be cached in memory, the I/O cost isn't even all that different.  That means
that total cost of just retrieving a recommendation is comparable to the
cost of computing it on the fly.

-- 
Ted Dunning, CTO
DeepDyve

Re: Taste-GenericItemBasedRecommender

Posted by Sean Owen <sr...@gmail.com>.

On Fri, Dec 4, 2009 at 7:35 PM, Ted Dunning <te...@gmail.com> wrote:
> The preferable approach is for the first MR step to group by user as before,
> then in the reduce down-sample the user items if desired and output that
> list in a single record.  Down-sampling can be done on-line keeping just the
> retained elements in memory.  Second MR would produce the cross product in
> the mapper and use a combiner and reducer.

That's what I'm doing -- outputting a Vector per user in the first MR.
(I'm leaving out the extras like downsampling until the basic approach works.)

I think I'm going a different way to produce the cooccurrence matrix -
no cross product, just counting and outputting all cooccurrence, and
outputting item1ID -> item2ID as key-value pairs. That makes it tidy
to produce the rows of the cooccurrence matrix in the reducer.

> Correct.  (A'A) h can be computed in several ways, but it all comes down to
> the fact that h is very sparse.  Typically you make it even sparser by
> keeping only recent history.  If you have only 50 non-zeros in h, then you
> only need 50 columns of (A'A).  These can be retrieved many different ways,
> but one cool way is to make each row of A'A be a Lucene document.  The terms
> in the documents are items and the columns of A'A are the posting vectors in
> Lucene.  The weighting that Lucene does generally helps but can easily be
> defeated if desired.

I'll hold off leveraging Lucene for later. I'll also probably start by
just loading the whole row but yeah that's not quite efficient. The
other optimizations you mention later also make sense.

> Another approach is to make each column of A'A be stored in a key-value
> store.  At recommendation time, you retrieve columns and add them.  This is
> essentially equivalent to the Lucene approach without lucene.  Because we
> know a lot about the contents (they are integers), you can probably write
> tighter code than Lucene can use.  This would be a great use for the fancy
> concurrent map builder that is in Google collections, for instance.

Sounds cool but don't I need the rows of A'A to multiply against h? h
is a column vector.

Also why did you later say recommendation must occur online? seems
quite doable offline and my picture of the point of this whole Hadoop
framework is doing things offline. They've already gone to the trouble
of running a cluster and have given up doing it entirely online, so...

Re: Taste-GenericItemBasedRecommender

Posted by Ted Dunning <te...@gmail.com>.

On Fri, Dec 4, 2009 at 12:33 AM, Sean Owen <sr...@gmail.com> wrote:

> Yes, this makes sense. I do need two passes. One pass converts input
> from "user,item,rating" triples into user vectors. Then the second
> step builds the co-occurrence A'A product. I agree that it will be
> faster to take a shortcut than properly compute A'A.
>

You can work directly with the triples as input if you have relatively small
input.  Mapper puts user as key, and (item) as value (this is cooccurrence
only so rating should only be used to filter which items get output).
Reducer takes list of items and produces cross product on output.  It
doesn't need to keep the full cross product.  Second MR positions (item1,
item2) as key and basically does a word count.

This approach has the problem of a large, uncombinable intermediate file.

The preferable approach is for the first MR step to group by user as before,
then in the reduce down-sample the user items if desired and output that
list in a single record.  Down-sampling can be done on-line keeping just the
retained elements in memory.  Second MR would produce the cross product in
the mapper and use a combiner and reducer.

If we wanted to follow up on Jake's idea, the transpose can be done in the
first MR step but it is hard to down-sample the user's items that way.

The first step of converting triples to vectors should also produce user and
item counts so that the second step can do the sparsification cleanly.

> (Though I'm curious how this works -- looks deceptively easy, this
> outer product approach. Isn't v cross v potentially huge? or likely to
> be sparse enough to not matter)
>

The cost is dominated by the square of the number of items that the most
prolific user has.  If you down sample that to something reasonable, then
you bound the total cost very tightly.  It is very unusual for the massively
prolific users to add any information ... generally they are spammers or
bots anyway.

> I understand the final step in principle, which is to compute (A'A)h.
> But I keep guessing A'A is too big to fit in memory?

Correct.  (A'A) h can be computed in several ways, but it all comes down to
the fact that h is very sparse.  Typically you make it even sparser by
keeping only recent history.  If you have only 50 non-zeros in h, then you
only need 50 columns of (A'A).  These can be retrieved many different ways,
but one cool way is to make each row of A'A be a Lucene document.  The terms
in the documents are items and the columns of A'A are the posting vectors in
Lucene.  The weighting that Lucene does generally helps but can easily be
defeated if desired.

Another approach is to make each column of A'A be stored in a key-value
store.  At recommendation time, you retrieve columns and add them.  This is
essentially equivalent to the Lucene approach without lucene.  Because we
know a lot about the contents (they are integers), you can probably write
tighter code than Lucene can use.  This would be a great use for the fancy
concurrent map builder that is in Google collections, for instance.

-- 
Ted Dunning, CTO
DeepDyve

Re: Taste-GenericItemBasedRecommender

Posted by Sean Owen <sr...@gmail.com>.

Yes, this makes sense. I do need two passes. One pass converts input
from "user,item,rating" triples into user vectors. Then the second
step builds the co-occurrence A'A product. I agree that it will be
faster to take a shortcut than properly compute A'A.

(Though I'm curious how this works -- looks deceptively easy, this
outer product approach. Isn't v cross v potentially huge? or likely to
be sparse enough to not matter)

I understand the final step in principle, which is to compute (A'A)h.
But I keep guessing A'A is too big to fit in memory? So I can
side-load the rows of A'A one at a time and compute it rather
manually.

On Thu, Dec 3, 2009 at 8:28 PM, Ted Dunning <te...@gmail.com> wrote:
> I think you can merge my passes into a single pass in which you compute the
> row and column sums at the same time that you compute the product.  That is
> more complicated, though, and I hate fancy code.  So you are right in
> practice that I have always had two passes.  (although pig might be clever
> enough by now to merge them)
>
> There is another pass in which you use all of the sums to do the
> sparsification.  I don't know if that could be done in the same pass or not.

Re: Taste-GenericItemBasedRecommender

Posted by Ted Dunning <te...@gmail.com>.

I think you can merge my passes into a single pass in which you compute the
row and column sums at the same time that you compute the product.  That is
more complicated, though, and I hate fancy code.  So you are right in
practice that I have always had two passes.  (although pig might be clever
enough by now to merge them)

There is another pass in which you use all of the sums to do the
sparsification.  I don't know if that could be done in the same pass or not.

On Thu, Dec 3, 2009 at 10:55 AM, Jake Mannix <ja...@gmail.com> wrote:

> But yeah, in the case at hand, maybe that transposition is a bunch of work.
> But is it more work to do my two MR jobs than to do your two MR jobs?  Both
> look
> like two full passes over the data, right?
>

-- 
Ted Dunning, CTO
DeepDyve

Re: Taste-GenericItemBasedRecommender

Posted by Jake Mannix <ja...@gmail.com>.

I guess yes, it depends on what form your data is in.  If you never have a
need
to produce the transpose, then this is costly, sure.  If you are already
building your
matrix out of something like say, a lucene index, then you already *have*
the
transpose.   Or if you have another use for doing a MR job to produce the
transpose and then keeping it around for other purposes, I guess.

But yeah, in the case at hand, maybe that transposition is a bunch of work.
But is
it more work to do my two MR jobs than to do your two MR jobs?  Both look
like two
full passes over the data, right?

  -jake

On Thu, Dec 3, 2009 at 10:46 AM, Ted Dunning <te...@gmail.com> wrote:

> I have always tried to avoid transposing the data matrix.
>
> During my design phase for this, I was working on patch-wise patterns.
> About then, Chris Dyer tried the simpler approach for a machine translation
> problem and got very good results.
>
> The major problem with the transpose is that it requires a MR job of its
> own
> that is nearly as expensive as the multiply.   The combiner really makes a
> huge difference to the inner product approach.
>
> On Thu, Dec 3, 2009 at 10:31 AM, Jake Mannix <ja...@gmail.com>
> wrote:
>
> > Wait, this is just doing A'A?  Am I misunderstanding, or is this not most
> > easily done by first transposing A into A', and then doing the outer
> > products instead of the inner products
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: Taste-GenericItemBasedRecommender

Posted by Ted Dunning <te...@gmail.com>.

I have always tried to avoid transposing the data matrix.

During my design phase for this, I was working on patch-wise patterns.
About then, Chris Dyer tried the simpler approach for a machine translation
problem and got very good results.

The major problem with the transpose is that it requires a MR job of its own
that is nearly as expensive as the multiply.   The combiner really makes a
huge difference to the inner product approach.

On Thu, Dec 3, 2009 at 10:31 AM, Jake Mannix <ja...@gmail.com> wrote:

> Wait, this is just doing A'A?  Am I misunderstanding, or is this not most
> easily done by first transposing A into A', and then doing the outer
> products instead of the inner products
>

-- 
Ted Dunning, CTO
DeepDyve

Re: Taste-GenericItemBasedRecommender

Posted by Jake Mannix <ja...@gmail.com>.

Wait, this is just doing A'A?  Am I misunderstanding, or is this not most
easily done by first transposing A into A', and then doing the outer
products
instead of the inner products:

class Map extends Mapper<IntWritable, Vector, NullWritable, Matrix> {
  map(IntWritable i, Vector v, Context c) { c.write(NullWritable.get(),
v.cross(v)); }
}

class Reduce extends Reducer<NullWritable, Matrix, NullWritable, Matrix> {
  Matrix out;
  reduce(NullWritable n, Iterable<Matrix> it, Context c) {
    for(Matrix m : it) out = out.plus(m);
    c.write(n.get(), m);
  }
}

Then the outer products are computed in the mapper, and the sum is done in
the
combiners and single reducer.

Not sure how you'd fold in sparsification to that though.

  -jake

Re: Taste-GenericItemBasedRecommender

Posted by Ted Dunning <te...@gmail.com>.

I think that the question here is how to generate the A' A product.

If so, then answer is to read one row of A at a time and emit all of the
product pairs that could happen from that row.  The key is the row and
column in the result matrix.

More precisely:

    map: i, A[i,*]  =>  foreach j foreach k yield ((j,k), A[i,j] * A[i,k])
    combine and reduce: (j,k), values => (j, k), sum(values)

The output is A'A in triple form.  You might be able to do a trick with
sorting and just use j as the key.  Then you could emit an entire row at a
time.

A second MR needs to produce row sums of A'A.

This product may require down-sampling of non-sparse rows of A because it is
quadratic in the densest row count.  I usually just use random sampling for
any row with more than several hundred entries.  This is defensible because
little additional information is gained after the first hundred or so
entries.

On Thu, Dec 3, 2009 at 7:02 AM, Sean Owen <sr...@gmail.com> wrote:

> One question that maybe Ted already knows the answer to: how would I
> iterate over A_u* for each A_u*? the mapper would touch only one each.
> I guess I could use a whole MR step to generate the cartesian product
> so to speak.
>

-- 
Ted Dunning, CTO
DeepDyve