You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by gabeweb <ga...@htc.com> on 2010/09/17 07:49:12 UTC

TreeClusteringRecommender, clustering, and multiple processors

(Sorry for reposting, posted this first to Mahout top-level list instead of
user or development)

I just discovered that TreeClusteringRecommender is performing the
clustering once for each processor on my computer.  This is because the
constructor doesn't build the clusters, so they aren't actually built until
the evaluation of the test items begin -- but the test item evaluation is
divided among my multiple processors.  Each one of the test item evaluation
threads discovers that the clusters haven't been built yet, so each thread
performs the clustering independently.

I think the solution to this problem is to add a call to buildClusters() as
the last line of the constructor -- this is similar to what e.g.
SVDRecommender does.  That way, the clustering is performed when the
recommender is instantiated, and then during testing each thread discovers
that the clusters have already been built, so they don't do the clustering
themselves (four times).  Does this sound like a reasonable thing to do?  It
seems to work for me.  Thanks. 
-- 
View this message in context: http://lucene.472066.n3.nabble.com/TreeClusteringRecommender-clustering-and-multiple-processors-tp1516032p1516032.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: TreeClusteringRecommender, clustering, and multiple processors

Posted by Sean Owen <sr...@gmail.com>.
I checked in what I think is a slightly better solution. The threads
will block until construction but won't cause it to re-build each
time. I just added a double-checked-locking pattern here, which is
99.9999% bulletproof in Java, and that's sufficient for this context.

On Fri, Sep 17, 2010 at 6:49 AM, gabeweb <ga...@htc.com> wrote:
>
> (Sorry for reposting, posted this first to Mahout top-level list instead of
> user or development)
>
> I just discovered that TreeClusteringRecommender is performing the
> clustering once for each processor on my computer.  This is because the
> constructor doesn't build the clusters, so they aren't actually built until
> the evaluation of the test items begin -- but the test item evaluation is
> divided among my multiple processors.  Each one of the test item evaluation
> threads discovers that the clusters haven't been built yet, so each thread
> performs the clustering independently.
>
> I think the solution to this problem is to add a call to buildClusters() as
> the last line of the constructor -- this is similar to what e.g.
> SVDRecommender does.  That way, the clustering is performed when the
> recommender is instantiated, and then during testing each thread discovers
> that the clusters have already been built, so they don't do the clustering
> themselves (four times).  Does this sound like a reasonable thing to do?  It
> seems to work for me.  Thanks.
> --
> View this message in context: http://lucene.472066.n3.nabble.com/TreeClusteringRecommender-clustering-and-multiple-processors-tp1516032p1516032.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>