You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Gabor Bernat <be...@primeranks.net> on 2013/04/01 12:26:34 UTC

Parallel GenericRecommenderIRStatsEvaluator?

Hello,

Is there any good reason why the *GenericRecommenderIRStatsEvaluator* does
not support parallel (multi-CPU) evaluation. Today is quite common to have
CPUs with more than one core, and IR evaluation on any reasonably sized
data set takes forever to finish. I'm thinking if we could parallelize the
evaluation, by breaking down the input into subsets, and accumulating at
the end the measurements of each subset, the evaluation time could be
heavily improved.

For example I have a data set with 2+ million ratings, and evaluating IR
with even 10% of this with a simple recommender takes more than 3 hours
with just a single core of my CPU being kept busy...

So?


Bernát GÁBOR

Re: Parallel GenericRecommenderIRStatsEvaluator?

Posted by Sean Owen <sr...@gmail.com>.

No, just was never written I suppose back in the day. The way it is
structured now it creates a test split for each user, which is also
slow, and may be challenging to memory limitations as that's N data
models in memory. You could take a crack at a patch.

When I rewrote this aspect in a separate project it was certainly
threaded and relied on a single test split. It's much faster indeed.

On Mon, Apr 1, 2013 at 11:26 AM, Gabor Bernat <be...@primeranks.net> wrote:
> Hello,
>
> Is there any good reason why the *GenericRecommenderIRStatsEvaluator* does
> not support parallel (multi-CPU) evaluation. Today is quite common to have
> CPUs with more than one core, and IR evaluation on any reasonably sized
> data set takes forever to finish. I'm thinking if we could parallelize the
> evaluation, by breaking down the input into subsets, and accumulating at
> the end the measurements of each subset, the evaluation time could be
> heavily improved.
>
> For example I have a data set with 2+ million ratings, and evaluating IR
> with even 10% of this with a simple recommender takes more than 3 hours
> with just a single core of my CPU being kept busy...
>
> So?
>
>
> Bernát GÁBOR