You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2010/08/13 21:18:17 UTC

[jira] Resolved: (MAHOUT-478) Do we need normalize SimilarityMatrixEntryKey?

     [ https://issues.apache.org/jira/browse/MAHOUT-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-478.
------------------------------

    Fix Version/s:     (was: 0.4)
       Resolution: Not A Problem

Am I right we think this is "not a problem" then?

> Do we need  normalize SimilarityMatrixEntryKey?
> -----------------------------------------------
>
>                 Key: MAHOUT-478
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-478
>             Project: Mahout
>          Issue Type: Question
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>            Reporter: Han Hui Wen 
>
> In org.apache.mahout.math.hadoop.similarity.SimilarityMatrixEntryKey
> {code}
> public static class SimilarityMatrixEntryKeyComparator extends WritableComparator {
>     protected SimilarityMatrixEntryKeyComparator() {
>       super(SimilarityMatrixEntryKey.class, true);
>     }
>     @Override
>     public int compare(WritableComparable a, WritableComparable b) {
>       SimilarityMatrixEntryKey key1 = (SimilarityMatrixEntryKey) a;
>       SimilarityMatrixEntryKey key2 = (SimilarityMatrixEntryKey) b;
>       int result = compare(key1.row, key2.row);
>       if (result == 0) {
>         result = -1 * compare(key1.value, key2.value);
>       }
>       return result;
>     }
>     protected static int compare(long a, long b) {
>       return (a == b) ? 0 : (a < b) ? -1 : 1;
>     }
>     protected static int compare(double a, double b) {
>       return (a == b) ? 0 : (a < b) ? -1 : 1;
>     }
>   }
> {code}
> We used double as one part of the key, 
> because of double has many possible value ,it will cause pairwiseSimilarity may has may group,
> the count of group also is out of our control.
> for example (ItemA ,0.1),(ItemA ,0.11),(ItemA ,0.01),(ItemA ,0.1),(ItemA ,0.001),(ItemA ,0.0011) is different group.
> Also double is inaccurate,it hard to compare the equal of double .
> So can we normalize the similarityValue ?
> multiply all similarityValue  with 100,1000 ,or other numer,and make it to a integer.
> If necessary we can transform them to double in the end.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.