You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Frank Scholten <fr...@frankscholten.nl> on 2014/08/01 11:53:45 UTC

MultithreadedBatchItemSimilarities with LLR versus Spark co-occurrence

Hi all,

I noticed the development of the Spark co-occurrence of MAHOUT-1464 and I
wondered if I could get similar results but with less scalability when I
use MultithreadedBatchItemSimilarities with LLRSimilarity.

I want to use a co-occurrence recommender on a smallish datasets of a few
GBs that does not warrant the use of a Spark cluster. Is the Spark
implementation mostly a more scalable version or is it an improved
implementation that gives different or better results?

Cheers,

Frank

Re: MultithreadedBatchItemSimilarities with LLR versus Spark co-occurrence

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

Not directly an answer -- but if anything, you can use spark in local mode
-- that's how our unit tests are written. Use something like `local[8]` for
master to enable multiple asynchronous workers.

There will we overhead in the area of <= 0.5 s compared to totally
spark-less execution, but if it is reasonably less compared to the rest of
the job (i.e. if your case is not really a micro-matrix case) then it
should not matter much.

-d

On Fri, Aug 1, 2014 at 2:53 AM, Frank Scholten <fr...@frankscholten.nl>
wrote:

> Hi all,
>
> I noticed the development of the Spark co-occurrence of MAHOUT-1464 and I
> wondered if I could get similar results but with less scalability when I
> use MultithreadedBatchItemSimilarities with LLRSimilarity.
>
> I want to use a co-occurrence recommender on a smallish datasets of a few
> GBs that does not warrant the use of a Spark cluster. Is the Spark
> implementation mostly a more scalable version or is it an improved
> implementation that gives different or better results?
>
> Cheers,
>
> Frank
>