You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Serega Sheypak <se...@gmail.com> on 2014/07/24 08:57:46 UTC

itemsimilarity returns few similar items

Hi, I'm trying to calc itemsimilarity using ItemSimilarityJob.
Here are my counts:
input dataset: user_id, item_id, pref: 16M
distinct items: 700K
distinct users: 4M

bucketed preferences per users
count_of_preferences, count_of_users
1                                   2M
2                                   600K
3                                   300K
4                                   300R
......

threshold: 0.91
similarityClassname=PEARSON

It returns ~2000 rows for ~1000 distinct items.
What do i do wrong?