You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Serega Sheypak <se...@gmail.com> on 2014/07/24 08:57:46 UTC
itemsimilarity returns few similar items
Hi, I'm trying to calc itemsimilarity using ItemSimilarityJob.
Here are my counts:
input dataset: user_id, item_id, pref: 16M
distinct items: 700K
distinct users: 4M
bucketed preferences per users
count_of_preferences, count_of_users
1 2M
2 600K
3 300K
4 300R
......
threshold: 0.91
similarityClassname=PEARSON
It returns ~2000 rows for ~1000 distinct items.
What do i do wrong?