You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Fernando Fernández <fe...@gmail.com> on 2010/12/09 18:42:07 UTC

Matrix to Matrix Distance

Hi everyone,

I'm facing a problem where I should compute, not the pair-wise distances of
the rows of a matrix, but the distances of each row of an small matrix (A
~1000 rows) to all the rows of a big matrix (B ~100000 rows), to determine
the nearest elements of B for each row of A. Is there an easy way to do this
in Mahout? (besides putting rows of A and B into a big matrix (C) and
launching a rowsimilarityjob, which would be very inefficient).


Thanks in advance.

Fernando.

Re: Matrix to Matrix Distance

Posted by Fernando Fernández <fe...@gmail.com>.
Hi Sebastian,

I can't give many details about the usecase, so I'll try to explain it in a
generic manner. It's kind of a generalization of a "MorteLikeThis" query, so
it would be a "MoreLikeThese" query. Instead of giving the "search engine" a
bunch characteristics describing what I want, I'll give it some examples of
what I'm looking for (I already have some, and I want a few more).

Ted, I have just arrived to map-reduce world, so I think your approach will
be more meaningful to me after I study a little bit more... I was expecting
an answer like "take class XXX and modify methods YYY  and ZZZ..."


Thanks!!

2010/12/10 Sebastian Schelter <ss...@apache.org>

> I'm curious about your usecase, could you give us some details?
>
> --sebastian
>
> 2010/12/10 Ted Dunning <te...@gmail.com>
>
> > This sounds like a map side join.  Put all of A into memory and have the
> > mappers compare vectors
> > of B to all rows of A.
> >
> > 2010/12/9 Fernando Fernández <fe...@gmail.com>
> >
> > > Hi everyone,
> > >
> > > I'm facing a problem where I should compute, not the pair-wise
> distances
> > of
> > > the rows of a matrix, but the distances of each row of an small matrix
> (A
> > > ~1000 rows) to all the rows of a big matrix (B ~100000 rows), to
> > determine
> > > the nearest elements of B for each row of A. Is there an easy way to do
> > > this
> > > in Mahout? (besides putting rows of A and B into a big matrix (C) and
> > > launching a rowsimilarityjob, which would be very inefficient).
> > >
> > >
> > > Thanks in advance.
> > >
> > > Fernando.
> > >
> >
>

Re: Matrix to Matrix Distance

Posted by Sebastian Schelter <ss...@apache.org>.
I'm curious about your usecase, could you give us some details?

--sebastian

2010/12/10 Ted Dunning <te...@gmail.com>

> This sounds like a map side join.  Put all of A into memory and have the
> mappers compare vectors
> of B to all rows of A.
>
> 2010/12/9 Fernando Fernández <fe...@gmail.com>
>
> > Hi everyone,
> >
> > I'm facing a problem where I should compute, not the pair-wise distances
> of
> > the rows of a matrix, but the distances of each row of an small matrix (A
> > ~1000 rows) to all the rows of a big matrix (B ~100000 rows), to
> determine
> > the nearest elements of B for each row of A. Is there an easy way to do
> > this
> > in Mahout? (besides putting rows of A and B into a big matrix (C) and
> > launching a rowsimilarityjob, which would be very inefficient).
> >
> >
> > Thanks in advance.
> >
> > Fernando.
> >
>

Re: Matrix to Matrix Distance

Posted by Ted Dunning <te...@gmail.com>.
This sounds like a map side join.  Put all of A into memory and have the
mappers compare vectors
of B to all rows of A.

2010/12/9 Fernando Fernández <fe...@gmail.com>

> Hi everyone,
>
> I'm facing a problem where I should compute, not the pair-wise distances of
> the rows of a matrix, but the distances of each row of an small matrix (A
> ~1000 rows) to all the rows of a big matrix (B ~100000 rows), to determine
> the nearest elements of B for each row of A. Is there an easy way to do
> this
> in Mahout? (besides putting rows of A and B into a big matrix (C) and
> launching a rowsimilarityjob, which would be very inefficient).
>
>
> Thanks in advance.
>
> Fernando.
>