You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Tomasz Tretkowski <tt...@wp.pl> on 2015/12/26 22:20:58 UTC

CachingUserSimilarity concurrency issue.

Hi all.

I am using GenericBooleanPrefItemBasedRecommender with Tanimoto similarity.
To improve performance I used CachingUserSimilarity (code below)

public Recommender buildRecommender(DataModel dataModel) throws TasteException {
        UserSimilarity similarity = new CachingUserSimilarity(new TanimotoCoefficientSimilarity(similarityDataModel), similarityDataModel);
        UserNeighborhood neighborhood = new NearestNUserNeighborhood(8, Double.NEGATIVE_INFINITY, similarity, dataModel, 1.0);
        return new GenericBooleanPrefUserBasedRecommender(dataModel, neighborhood, similarity);
    }

However when I ran load tests on my server then I found that only about two out of eight cpu cores were utilized. The rest was waiting idle in org.apache.mahout.cf.taste.impl.common.Cache.get method. That is probably due to the fact that Cache.get method used in CachingUserSimilarity looks like this:
public V get(K key) throws TasteException {
    V value;
    synchronized (cache) {
      value = cache.get(key);
    }
...

Threads doing concurrent reads are blocking each other. I would expect standard double-checked locking pattern there to avoid that if it is not necessary (i.e. there is a hit and the cache is not modified).
Am I doing something wrong or there is a defect in Cache class?

--
Pozdrowienia,
 Tomasz Tretkowski



Re: CachingUserSimilarity concurrency issue.

Posted by Pat Ferrel <pa...@occamsmachete.com>.
That is from some very old code that is on the deprecation path. Mahout doesn’t accept Hadoop Mapreduce code anymore and this is even older, part of the Taste in-memory recommender. So if you change it, you may have to maintain it yourself. 

If you want something more modern, check out the Mahout + Spark + Search engine recommender here: https://templates.prediction.io/PredictionIO/template-scala-parallel-universal-recommendation. Too many new features to list here but still based on the SimilarityAnalysis part of Mahout.

> On Dec 26, 2015, at 1:20 PM, Tomasz Tretkowski <tt...@wp.pl> wrote:
> 
> Hi all.
> 
> I am using GenericBooleanPrefItemBasedRecommender with Tanimoto similarity.
> To improve performance I used CachingUserSimilarity (code below)
> 
> public Recommender buildRecommender(DataModel dataModel) throws TasteException {
>        UserSimilarity similarity = new CachingUserSimilarity(new TanimotoCoefficientSimilarity(similarityDataModel), similarityDataModel);
>        UserNeighborhood neighborhood = new NearestNUserNeighborhood(8, Double.NEGATIVE_INFINITY, similarity, dataModel, 1.0);
>        return new GenericBooleanPrefUserBasedRecommender(dataModel, neighborhood, similarity);
>    }
> 
> However when I ran load tests on my server then I found that only about two out of eight cpu cores were utilized. The rest was waiting idle in org.apache.mahout.cf.taste.impl.common.Cache.get method. That is probably due to the fact that Cache.get method used in CachingUserSimilarity looks like this:
> public V get(K key) throws TasteException {
>    V value;
>    synchronized (cache) {
>      value = cache.get(key);
>    }
> ...
> 
> Threads doing concurrent reads are blocking each other. I would expect standard double-checked locking pattern there to avoid that if it is not necessary (i.e. there is a hit and the cache is not modified).
> Am I doing something wrong or there is a defect in Cache class?
> 
> --
> Pozdrowienia,
> Tomasz Tretkowski
> 
>