You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Dmitriy Lyubimov <dl...@gmail.com> on 2012/08/01 20:29:59 UTC

Re: performance study

I only know comparisons of parallel algorithms only. There's
performance and accuracy comparison between Mahout's SSVD and Lanczos
done in dissertation of N. Halko (see link at SSVD page on Mahout
wiki). There's also a "Heigen" SVD paper that discusses distributed
modified Lanczos method of a proprietary Hadoop-based implemetnation
at Yahoo. Even though it doesn't draw side-by-side comparisons, it
does present benchmark figures for the Heigen implementation so one
can approximately draw comparisons between Heigen and Mahout methods.

w.r.t to parallel vs. non-parallel, IMO the bottom line is
practicality, not necessarily speed. There are some SVD problems that
one might argue that single computer solution is not practical and
which a distributed algorithm may actually shift into realm of
practical solutions. (in a sense that you don't need days to solve
it). But IMO direct comparison still doesn't make a lot of sense.

On Sat, Jul 28, 2012 at 9:27 AM, mohsen jadidi <mo...@gmail.com> wrote:
> Thank you for your replies. What I am interested to know is that if I want
> to compute the SVD for huge matrix , how much faster my computation get by
> using Mahout.
>
> On Fri, Jul 27, 2012 at 8:12 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>
>> IMO it doesn't make much sense to compare non-parallel and a parallel
>> algorithm (assuming they are running approximately same flops-sized
>> computation). Which is probably why there's not so many (i don't know
>> any).
>>
>> However, there are studies comparing parallel approaches (e.g. certain
>> mahout vs. giraph methods) given same amount of flops capacity in a
>> cluster, but i think you need to be more specific because there are
>> too many areas of interest you are talking about.
>>
>> On Fri, Jul 27, 2012 at 8:57 AM, mohsen jadidi <mo...@gmail.com>
>> wrote:
>> > Hey all,
>> >
>> > I am looking for some case studies which has evaluated  some of Mahout
>> > algorithm implementation like different decomposition or different
>> > classifier. I just want to know how much faster is the Mahout in compare
>> of
>> > regular non. paralleled algorithms.I couldnt find anything useful.
>> >
>> > Thanks in advance,
>> >
>> > --
>> > Mohsen Jadidi
>>
>
>
>
> --
> Mohsen Jadidi

Re: performance study

Posted by Ted Dunning <te...@gmail.com>.

I would like to endorse this point.

If your sparse data fits in memory on a single machine, it is very unlikely
that you will be able to improve on the cost of doing a stochastic
projection on that one machine using any Hadoop based solution.

Even with MPI and crazy RDMA networking, I doubt that you would beat it by
much, if any.

On Wed, Aug 1, 2012 at 12:36 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> also as Lance mentioned, usually "coefficient of performance" per core
> for distributed methods is lower than that of an iterative method. It
> is hard (if even possible) to achieve 100% scalability here. Simply
> put, if you have 5 computers to solve same problem, it will not be
> solved 5 times faster than a comparable method on a single computer.
>
> On Wed, Aug 1, 2012 at 11:29 AM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
> > I only know comparisons of parallel algorithms only. There's
> > performance and accuracy comparison between Mahout's SSVD and Lanczos
> > done in dissertation of N. Halko (see link at SSVD page on Mahout
> > wiki). There's also a "Heigen" SVD paper that discusses distributed
> > modified Lanczos method of a proprietary Hadoop-based implemetnation
> > at Yahoo. Even though it doesn't draw side-by-side comparisons, it
> > does present benchmark figures for the Heigen implementation so one
> > can approximately draw comparisons between Heigen and Mahout methods.
> >
> > w.r.t to parallel vs. non-parallel, IMO the bottom line is
> > practicality, not necessarily speed. There are some SVD problems that
> > one might argue that single computer solution is not practical and
> > which a distributed algorithm may actually shift into realm of
> > practical solutions. (in a sense that you don't need days to solve
> > it). But IMO direct comparison still doesn't make a lot of sense.
> >
> > On Sat, Jul 28, 2012 at 9:27 AM, mohsen jadidi <mo...@gmail.com>
> wrote:
> >> Thank you for your replies. What I am interested to know is that if I
> want
> >> to compute the SVD for huge matrix , how much faster my computation get
> by
> >> using Mahout.
> >>
> >> On Fri, Jul 27, 2012 at 8:12 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
> >>
> >>> IMO it doesn't make much sense to compare non-parallel and a parallel
> >>> algorithm (assuming they are running approximately same flops-sized
> >>> computation). Which is probably why there's not so many (i don't know
> >>> any).
> >>>
> >>> However, there are studies comparing parallel approaches (e.g. certain
> >>> mahout vs. giraph methods) given same amount of flops capacity in a
> >>> cluster, but i think you need to be more specific because there are
> >>> too many areas of interest you are talking about.
> >>>
> >>> On Fri, Jul 27, 2012 at 8:57 AM, mohsen jadidi <
> mohsen.jadidi@gmail.com>
> >>> wrote:
> >>> > Hey all,
> >>> >
> >>> > I am looking for some case studies which has evaluated  some of
> Mahout
> >>> > algorithm implementation like different decomposition or different
> >>> > classifier. I just want to know how much faster is the Mahout in
> compare
> >>> of
> >>> > regular non. paralleled algorithms.I couldnt find anything useful.
> >>> >
> >>> > Thanks in advance,
> >>> >
> >>> > --
> >>> > Mohsen Jadidi
> >>>
> >>
> >>
> >>
> >> --
> >> Mohsen Jadidi
>

Re: performance study

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

also as Lance mentioned, usually "coefficient of performance" per core
for distributed methods is lower than that of an iterative method. It
is hard (if even possible) to achieve 100% scalability here. Simply
put, if you have 5 computers to solve same problem, it will not be
solved 5 times faster than a comparable method on a single computer.

On Wed, Aug 1, 2012 at 11:29 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> I only know comparisons of parallel algorithms only. There's
> performance and accuracy comparison between Mahout's SSVD and Lanczos
> done in dissertation of N. Halko (see link at SSVD page on Mahout
> wiki). There's also a "Heigen" SVD paper that discusses distributed
> modified Lanczos method of a proprietary Hadoop-based implemetnation
> at Yahoo. Even though it doesn't draw side-by-side comparisons, it
> does present benchmark figures for the Heigen implementation so one
> can approximately draw comparisons between Heigen and Mahout methods.
>
> w.r.t to parallel vs. non-parallel, IMO the bottom line is
> practicality, not necessarily speed. There are some SVD problems that
> one might argue that single computer solution is not practical and
> which a distributed algorithm may actually shift into realm of
> practical solutions. (in a sense that you don't need days to solve
> it). But IMO direct comparison still doesn't make a lot of sense.
>
> On Sat, Jul 28, 2012 at 9:27 AM, mohsen jadidi <mo...@gmail.com> wrote:
>> Thank you for your replies. What I am interested to know is that if I want
>> to compute the SVD for huge matrix , how much faster my computation get by
>> using Mahout.
>>
>> On Fri, Jul 27, 2012 at 8:12 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>>
>>> IMO it doesn't make much sense to compare non-parallel and a parallel
>>> algorithm (assuming they are running approximately same flops-sized
>>> computation). Which is probably why there's not so many (i don't know
>>> any).
>>>
>>> However, there are studies comparing parallel approaches (e.g. certain
>>> mahout vs. giraph methods) given same amount of flops capacity in a
>>> cluster, but i think you need to be more specific because there are
>>> too many areas of interest you are talking about.
>>>
>>> On Fri, Jul 27, 2012 at 8:57 AM, mohsen jadidi <mo...@gmail.com>
>>> wrote:
>>> > Hey all,
>>> >
>>> > I am looking for some case studies which has evaluated  some of Mahout
>>> > algorithm implementation like different decomposition or different
>>> > classifier. I just want to know how much faster is the Mahout in compare
>>> of
>>> > regular non. paralleled algorithms.I couldnt find anything useful.
>>> >
>>> > Thanks in advance,
>>> >
>>> > --
>>> > Mohsen Jadidi
>>>
>>
>>
>>
>> --
>> Mohsen Jadidi