You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by First Qaxy <qa...@yahoo.ca> on 2010/05/04 22:25:28 UTC

Re: Algorithm scalability

Thanks Ted. This is the information I was looking for as I am just starting to evaluate Mahout. At this point I only have estimates for the data set and little real data to test it. As I'll receive data I'll post my findings.

--- On Mon, 4/26/10, Ted Dunning <te...@gmail.com> wrote:

From: Ted Dunning <te...@gmail.com>
Subject: Re: Algorithm scalability
To: mahout-user@lucene.apache.org
Received: Monday, April 26, 2010, 2:35 AM

Some of the recommendation algorithms are less scalable, but the clustering
algorithms, frequent item set mining and most of the classification
algorithms are parallelized with good bounds on time and space or should be
relatively scalable single processor implementations.  There are relatively
scalable versions of recommendation algorithms that should exhibit fairly
good scaling even to billions of data points.

There are, however, relatively few users with really large data sets so far
so you will almost certain encounter some problems with some algorithms.
Your feedback would be of great interest and the result would likely be
fairly quick improvements.

If you can be more specific, I am sure you can get some better answers than
I just gave.

On Sun, Apr 25, 2010 at 10:00 PM, First Qaxy <qa...@yahoo.ca> wrote:

> Hi All,
> How scalable are Mahout's algorithms (clustering, classification, also
> recommendation) for very large datasets (>1bln points, or in other words
> have memory constraints)?
> Many thanks.
> -qf
>
>