You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Yann Moisan (JIRA)" <ji...@apache.org> on 2012/10/18 16:50:04 UTC
[jira] [Updated] (MAHOUT-1090) Add a similarity implementation that
computes cosine over all entries
[ https://issues.apache.org/jira/browse/MAHOUT-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yann Moisan updated MAHOUT-1090:
--------------------------------
Comment: was deleted
(was: In my case, term vectors for computing document similarities, missing entries really means 0. And this allows to do a trick with a HashMap mX to avoid O(n2) complexity.
If all missing entries have non-zero values, the trick would not be efficient due to memory overhead.
So I really understand your point of view and it may need yet another implementation.)
> Add a similarity implementation that computes cosine over all entries
> ---------------------------------------------------------------------
>
> Key: MAHOUT-1090
> URL: https://issues.apache.org/jira/browse/MAHOUT-1090
> Project: Mahout
> Issue Type: New Feature
> Components: Collaborative Filtering
> Affects Versions: 0.7
> Reporter: Yann Moisan
> Assignee: Sean Owen
> Priority: Minor
> Fix For: 0.8
>
>
> The aim of this feature is to use a recommender to compute similarities as the hadoop RowSimilarityJob. It will be faster for small dataset because in-memory. So we need an in-memory implementation of the Cosine Similarity which computes cosine over all entries (UncenteredCosineSimilarity use only entries that are in both vectors).
> Here is my implementation (doesn't support refresh for the moment):
> import java.util.Collection;
> import java.util.HashMap;
> import java.util.Map;
> import org.apache.mahout.cf.taste.common.Refreshable;
> import org.apache.mahout.cf.taste.common.TasteException;
> import org.apache.mahout.cf.taste.impl.similarity.AbstractItemSimilarity;
> import org.apache.mahout.cf.taste.model.DataModel;
> import org.apache.mahout.cf.taste.model.PreferenceArray;
> public class CosineSimilarity extends AbstractItemSimilarity {
> protected CosineSimilarity(DataModel dataModel) {
> super(dataModel);
> }
> @Override
> public void refresh(Collection<Refreshable> alreadyRefreshed) {
> throw new UnsupportedOperationException();
> }
> @Override
> public double itemSimilarity(long itemID1, long itemID2) throws TasteException {
> DataModel model = getDataModel();
> PreferenceArray xPrefs = model.getPreferencesForItem(itemID1);
> PreferenceArray yPrefs = model.getPreferencesForItem(itemID2);
> double sumXY = 0;
> double sumX2 = 0;
> double sumY2 = 0;
> Map<Long, Float> mX = new HashMap<Long, Float>();
> for (int xPrefIndex = 0; xPrefIndex < xPrefs.length(); xPrefIndex++) {
> float x = xPrefs.get(xPrefIndex).getValue();
> mX.put(xPrefs.get(xPrefIndex).getUserID(), x);
> sumX2 += x * x;
> }
> for (int yPrefIndex = 0; yPrefIndex < yPrefs.length(); yPrefIndex++) {
> float y = yPrefs.get(yPrefIndex).getValue();
> Float x = mX.get(yPrefs.get(yPrefIndex).getUserID());
> if (x != null) {
> sumXY += x * y;
> }
> sumY2 += y * y;
> }
> return sumXY / (Math.sqrt(sumX2) * Math.sqrt(sumY2));
> }
> @Override
> public double[] itemSimilarities(long itemID1, long[] itemID2s) throws TasteException {
> int length = itemID2s.length;
> double[] result = new double[length];
> for (int i = 0; i < length; i++) {
> result[i] = itemSimilarity(itemID1, itemID2s[i]);
> }
> return result;
> }
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira