You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Serega Sheypak <se...@gmail.com> on 2014/11/04 13:00:30 UTC

mahout 0.7 and 09. difference for org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

Hi, i used org.apache.mahout.cf.taste.hadoop.item.RecommenderJob in mahout
0.7 (CDH4)
Here are parameters:
numRecommendations=1000
threshold=0.91
maxSimilaritiesPerItem=1000
maxPrefsPerUserInItemSimilarity=10
similarityClassname=SIMILARITY_LOGLIKELIHOOD

Then I migrated to 0.9 (CDH5)
I've found one difference:
maxPrefsPerUserInItemSimilarity renamed to maxPrefsInItemSimilarity

The other thing is how it works.
I see this output in 0.7:

USER_RATINGS_NEGLECTED=14954083

USER_RATINGS_USED=32355513

=====

COOCCURRENCES=72 503 210

PRUNED_COOCCURRENCES=0


output in 0.9:

NEGLECTED_OBSERVATIONS=39 175 989

ROWS=4 937 362

USED_OBSERVATIONS=10 840 138

=====

org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
COOCCURRENCES=17 645 029

PRUNED_COOCCURRENCES=0


And 0.9 gives me awful result, just trash.

I run  over the same dataset

mahout 0.7 is on old production CDH4 cluster,

mahout 0.9 is on new CDH5 cluster.



Why there is so huge difference? Is there any possibility to fix it?

Re: mahout 0.7 and 09. difference for org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

Posted by Wei Li <we...@gmail.com>.
got it, thanks :)

On Thu, Nov 20, 2014 at 10:44 PM, Serega Sheypak <se...@gmail.com>
wrote:

> We did the same, just switched back to 0.7 and problem is gone.
> Anyway, we are in trouble :)
>
> 2014-11-20 15:54 GMT+03:00 Wei Li <we...@gmail.com>:
>
> > Hi Serega:
> >
> >     We have also tried the mahout 0.9 RecommenderJob, and also found the
> > the result is not good either. We are now debugging into the source code
> to
> > find the possible issues. So how about the output of mahout 0.7? we will
> > switch to this version if the result is acceptable, thanks.
> >
> > Best
> > Wei
> >
> > On Tue, Nov 4, 2014 at 8:00 PM, Serega Sheypak <serega.sheypak@gmail.com
> >
> > wrote:
> >
> > > Hi, i used org.apache.mahout.cf.taste.hadoop.item.RecommenderJob in
> > mahout
> > > 0.7 (CDH4)
> > > Here are parameters:
> > > numRecommendations=1000
> > > threshold=0.91
> > > maxSimilaritiesPerItem=1000
> > > maxPrefsPerUserInItemSimilarity=10
> > > similarityClassname=SIMILARITY_LOGLIKELIHOOD
> > >
> > > Then I migrated to 0.9 (CDH5)
> > > I've found one difference:
> > > maxPrefsPerUserInItemSimilarity renamed to maxPrefsInItemSimilarity
> > >
> > > The other thing is how it works.
> > > I see this output in 0.7:
> > >
> > > USER_RATINGS_NEGLECTED=14954083
> > >
> > > USER_RATINGS_USED=32355513
> > >
> > > =====
> > >
> > > COOCCURRENCES=72 503 210
> > >
> > > PRUNED_COOCCURRENCES=0
> > >
> > >
> > > output in 0.9:
> > >
> > > NEGLECTED_OBSERVATIONS=39 175 989
> > >
> > > ROWS=4 937 362
> > >
> > > USED_OBSERVATIONS=10 840 138
> > >
> > > =====
> > >
> > >
> > >
> >
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> > > COOCCURRENCES=17 645 029
> > >
> > > PRUNED_COOCCURRENCES=0
> > >
> > >
> > > And 0.9 gives me awful result, just trash.
> > >
> > > I run  over the same dataset
> > >
> > > mahout 0.7 is on old production CDH4 cluster,
> > >
> > > mahout 0.9 is on new CDH5 cluster.
> > >
> > >
> > >
> > > Why there is so huge difference? Is there any possibility to fix it?
> > >
> >
>

Re: mahout 0.7 and 09. difference for org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

Posted by Serega Sheypak <se...@gmail.com>.
We did the same, just switched back to 0.7 and problem is gone.
Anyway, we are in trouble :)

2014-11-20 15:54 GMT+03:00 Wei Li <we...@gmail.com>:

> Hi Serega:
>
>     We have also tried the mahout 0.9 RecommenderJob, and also found the
> the result is not good either. We are now debugging into the source code to
> find the possible issues. So how about the output of mahout 0.7? we will
> switch to this version if the result is acceptable, thanks.
>
> Best
> Wei
>
> On Tue, Nov 4, 2014 at 8:00 PM, Serega Sheypak <se...@gmail.com>
> wrote:
>
> > Hi, i used org.apache.mahout.cf.taste.hadoop.item.RecommenderJob in
> mahout
> > 0.7 (CDH4)
> > Here are parameters:
> > numRecommendations=1000
> > threshold=0.91
> > maxSimilaritiesPerItem=1000
> > maxPrefsPerUserInItemSimilarity=10
> > similarityClassname=SIMILARITY_LOGLIKELIHOOD
> >
> > Then I migrated to 0.9 (CDH5)
> > I've found one difference:
> > maxPrefsPerUserInItemSimilarity renamed to maxPrefsInItemSimilarity
> >
> > The other thing is how it works.
> > I see this output in 0.7:
> >
> > USER_RATINGS_NEGLECTED=14954083
> >
> > USER_RATINGS_USED=32355513
> >
> > =====
> >
> > COOCCURRENCES=72 503 210
> >
> > PRUNED_COOCCURRENCES=0
> >
> >
> > output in 0.9:
> >
> > NEGLECTED_OBSERVATIONS=39 175 989
> >
> > ROWS=4 937 362
> >
> > USED_OBSERVATIONS=10 840 138
> >
> > =====
> >
> >
> >
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> > COOCCURRENCES=17 645 029
> >
> > PRUNED_COOCCURRENCES=0
> >
> >
> > And 0.9 gives me awful result, just trash.
> >
> > I run  over the same dataset
> >
> > mahout 0.7 is on old production CDH4 cluster,
> >
> > mahout 0.9 is on new CDH5 cluster.
> >
> >
> >
> > Why there is so huge difference? Is there any possibility to fix it?
> >
>

Re: mahout 0.7 and 09. difference for org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

Posted by Wei Li <we...@gmail.com>.
Hi Serega:

    We have also tried the mahout 0.9 RecommenderJob, and also found the
the result is not good either. We are now debugging into the source code to
find the possible issues. So how about the output of mahout 0.7? we will
switch to this version if the result is acceptable, thanks.

Best
Wei

On Tue, Nov 4, 2014 at 8:00 PM, Serega Sheypak <se...@gmail.com>
wrote:

> Hi, i used org.apache.mahout.cf.taste.hadoop.item.RecommenderJob in mahout
> 0.7 (CDH4)
> Here are parameters:
> numRecommendations=1000
> threshold=0.91
> maxSimilaritiesPerItem=1000
> maxPrefsPerUserInItemSimilarity=10
> similarityClassname=SIMILARITY_LOGLIKELIHOOD
>
> Then I migrated to 0.9 (CDH5)
> I've found one difference:
> maxPrefsPerUserInItemSimilarity renamed to maxPrefsInItemSimilarity
>
> The other thing is how it works.
> I see this output in 0.7:
>
> USER_RATINGS_NEGLECTED=14954083
>
> USER_RATINGS_USED=32355513
>
> =====
>
> COOCCURRENCES=72 503 210
>
> PRUNED_COOCCURRENCES=0
>
>
> output in 0.9:
>
> NEGLECTED_OBSERVATIONS=39 175 989
>
> ROWS=4 937 362
>
> USED_OBSERVATIONS=10 840 138
>
> =====
>
>
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> COOCCURRENCES=17 645 029
>
> PRUNED_COOCCURRENCES=0
>
>
> And 0.9 gives me awful result, just trash.
>
> I run  over the same dataset
>
> mahout 0.7 is on old production CDH4 cluster,
>
> mahout 0.9 is on new CDH5 cluster.
>
>
>
> Why there is so huge difference? Is there any possibility to fix it?
>